• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 1
  • 1
  • Tagged with
  • 10
  • 10
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Recognizing Indoor Scenes

Torralba, Antonio, Sinha, Pawan 25 July 2001 (has links)
We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-resolution spatial and spectral information. The holistic nature of the representation dispenses with the need to rely on specific objects or local landmarks and also renders it robust against variations in object configurations. We demonstrate the scheme on the problem of recognizing scenes in video sequences captured while walking through an office environment. We develop a method for distinguishing between 'diagnostic' and 'generic' views and also evaluate changes in system performances as a function of the amount of training data available and the complexity of the representation.
2

Texture analysis of high resolution panchromatic imagery for terrain classification

Humphrey, Matthew Donald 06 1900 (has links)
Approved for public release, distribution is unlimited / Terrain classification is studied here using the tool of texture analysis of high-spatial resolution panchromatic imagery. This study analyzes the impact and effectiveness of texture analysis on terrain classification within the Elkhorn Slough Estuary and surrounding farmlands within the central California coastal region. Ikonos panchromatic (1 meter) and multispectral (4 meter) imagery data are examined to determine the impact of adding texture analysis to the standard MSI classification approaches. Spectral Angle Mapper and Maximum Likelihood classifiers are used. Overall accuracy rates increased with the addition of the texture processing. The classification accuracy rate rose from 81.0% for the MSI data to 83.9% when the additional texture measures were added. Modest accuracy (55%) was obtained from texture analysis alone. The addition of textural data also enhanced the classifier's ability to discriminate between several different woodland classes contained within the image. / Lieutenant Commander, United States Navy
3

Multiple Drone Detection and Acoustic Scene Classification with Deep Learning

Vemula, Hari Charan January 2018 (has links)
No description available.
4

Multi angle imaging with spectral remote sensing for scene classification

Prasert, Sunyaruk 03 1900 (has links)
Approved for public release, distribution is unlimited / ine discrimination of similar soil classes was produced by the BRDF variations in the high-spatial resolution panchromatic image. Texture analysis results depended on the directionality of the gray level co-occurrence matrix (GLCM) calculation. Combining the different modalities of analysis did not improve the overall classification, perhaps illustrating the consequences of the Hughes paradox (Hughes, 1968) / Flight Lieutenant, Royal Thai Air Force
5

Methods and systems for vision-based proactive applications

Huttunen, S. (Sami) 22 November 2011 (has links)
Abstract Human-computer interaction (HCI) is an integral part of modern society. Since the number of technical devices around us is increasing, the way of interacting is changing as well. The systems of the future should be proactive, so that they can adapt and adjust to people’s movements and actions without requiring any conscious control. Visual information plays a vital role in this kind of implicit human-computer interaction due to its expressiveness. It is therefore obvious that cameras equipped with computing power and computer vision techniques provide an unobtrusive way of analyzing human intentions. Despite its many advantages, use of computer vision is not always straightforward. Typically, every application sets specific requirements for the methods that can be applied. Given these motivations, this thesis aims to develop new vision-based methods and systems that can be utilized in proactive applications. As a case study, the thesis covers two different proactive computer vision applications. Firstly, an automated system that takes care of both the selection and switching of the video source in a distance education situation is presented. The system is further extended with a pan-tilt-zoom camera system that is designed to track the teacher when s/he walks at the front of the classroom. The second proactive application is targeted at mobile devices. The system presented recognizes landscape scenes which can be utilized in automatic shooting mode selection. Distributed smart cameras have been an active area of research in recent years, and they play an important role in many applications. Most of the research has focused on either the computer vision algorithms or on a specific implementation. There has been less activity on building generic frameworks which allow different algorithms, sensors and distribution methods to be used. In this field, the thesis presents an open and expendable framework for development of distributed sensor networks with an emphasis on peer-to-peer networking. From the methodological point of view, the thesis makes its contribution to the field of multi-object tracking. The method presented utilizes soft assignment to associate the measurements to the objects tracked. In addition, the thesis also presents two different ways of extracting location measurements from images. As a result, the method proposed provides location and trajectories of multiple objects which can be utilized in proactive applications. / Tiivistelmä Ihmisen ja eri laitteiden välisellä vuorovaikutuksella on keskeinen osa nyky-yhteiskunnassa. Teknisten laitteiden lisääntymisen myötä vuorovaikutustavat ovat myös muuttumassa. Tulevaisuuden järjestelmien tulisi olla proaktiivisia, jotta ne voisivat sopeutua ihmisten liikkeisiin ja toimintoihin ilman tietoista ohjausta. Ilmaisuvoimansa ansiosta visuaalisella tiedolla on keskeinen rooli tällaisessa epäsuorassa ihminen-tietokone –vuorovaikutuksessa. Tämän vuoksi on selvää, että kamerat yhdessä laskentaresurssien ja konenäkömenetelmien kanssa tarjoavat huomaamattoman tavan ihmisten toiminnan analysointiin. Lukuisista eduistaan huolimatta konenäön soveltaminen ei ole aina suoraviivaista. Yleensä jokainen sovellus asettaa erikoisvaatimuksia käytettäville menetelmille. Tästä syystä väitöskirjassa on päämääränä kehittää uusia kuvatietoon perustuvia menetelmiä ja järjestelmiä, joita voidaan hyödyntää proaktiivisissa sovelluksissa. Tässä väitöskirjassa esitellään kaksi proaktiivista sovellusta, jotka molemmat hyödyntävät tietokonenäköä. Ensimmäinen sovellus on etäopetusjärjestelmä, joka valitsee ja vaihtaa kuvalähteen automaattisesti. Järjestelmään esitellään myös ohjattavaan kameraan perustava laajennus, jonka avulla opettajaa voidaan seurata hänen liikkuessaan eri puolilla luokkahuonetta. Toinen proaktiivisen tekniikan sovellus on tarkoitettu mobiililaitteisiin. Kehitetty järjestelmä kykenee tunnistamaan maisemakuvat, jolloin kameran kuvaustila voidaan asettaa automaattisesti. Monissa sovelluksissa on tarpeen käyttää useampia kameroita. Tämän seurauksena eri puolille ympäristöä sijoitettavat älykkäät kamerat ovat olleet viime vuosina erityisen kiinnostuksen kohteena. Suurin osa kehityksestä on kuitenkin keskittynyt lähinnä eri konenäköalgoritmeihin tai yksittäisiin sovelluksiin. Sen sijaan panostukset yleisiin ja helposti laajennettaviin ratkaisuihin, jotka mahdollistavat erilaisten menetelmien, sensoreiden ja tiedonvälityskanavien käyttämisen, ovat olleet vähäisempiä. Tilanteen parantamiseksi väitöskirjassa esitellään hajautettujen sensoriverkkojen kehitykseen tarkoitettu avoin ja laajennettavissa oleva ohjelmistorunko. Menetelmien osalta tässä väitöskirjassa keskitytään useiden kohteiden seurantaan. Kehitetty seurantamenetelmä yhdistää saadut paikkamittaukset seurattaviin kohteisiin siten, että jokaiselle mittaukselle lasketaan todennäköisyys, jolla se kuuluu jokaiseen yksittäiseen seurattavaan kohteeseen. Seurantaongelman lisäksi työssä esitellään kaksi erilaista tapaa, joilla kohteiden paikka kuvassa voidaan määrittää. Esiteltyä kokonaisuutta voidaan hyödyntää proaktiivisissa sovelluksissa, jotka tarvitsevat usean kohteen paikkatiedon tai kohteiden kulkeman reitin.
6

Mutual Enhancement of Environment Recognition and Semantic Segmentation in Indoor Environment

Challa, Venkata Vamsi January 2024 (has links)
Background:The dynamic field of computer vision and artificial intelligence has continually evolved, pushing the boundaries in areas like semantic segmentation andenvironmental recognition, pivotal for indoor scene analysis. This research investigates the integration of these two technologies, examining their synergy and implicayions for enhancing indoor scene understanding. The application of this integrationspans across various domains, including smart home systems for enhanced ambientliving, navigation assistance for Cleaning robots, and advanced surveillance for security. Objectives: The primary goal is to assess the impact of integrating semantic segmentation data on the accuracy of environmental recognition algorithms in indoor environments. Additionally, the study explores how environmental context can enhance the precision and accuracy of contour-aware semantic segmentation. Methods: The research employed an extensive methodology, utilizing various machine learning models, including standard algorithms, Long Short-Term Memorynetworks, and ensemble methods. Transfer learning with models like EfficientNet B3, MobileNetV3 and Vision Tranformer was a key aspect of the experimentation. The experiments were designed to measure the effect of semantic segmentation on environmental recognition and its reciprocal influence. Results: The findings indicated that the integration of semantic segmentation data significantly enhanced the accuracy of environmental recognition algorithms. Conversely, incorporating environmental context into contour-aware semantic segmentation led to notable improvements in precision and accuracy, reflected in metrics such as Mean Intersection over Union(MIoU). Conclusion: This research underscores the mutual enhancement between semantic segmentation and environmental recognition, demonstrating how each technology significantly boosts the effectiveness of the other in indoor scene analysis. The integration of semantic segmentation data notably elevates the accuracy of environmental recognition algorithms, while the incorporation of environmental context into contour-aware semantic segmentation substantially improves its precision and accuracy.The results also open avenues for advancements in automated annotation processes, paving the way for smarter environmental interaction.
7

Natural scene classification, annotation and retrieval : developing different approaches for semantic scene modelling based on Bag of Visual Words

Alqasrawi, Yousef T. N. January 2012 (has links)
With the availability of inexpensive hardware and software, digital imaging has become an important medium of communication in our daily lives. A huge amount of digital images are being collected and become available through the internet and stored in various fields such as personal image collections, medical imaging, digital arts etc. Therefore, it is important to make sure that images are stored, searched and accessed in an efficient manner. The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Based on this promising model, this thesis investigates three main problems: natural scene classification, annotation and retrieval. Given an image, the task is to design a system that can determine to which class that image belongs to (classification), what semantic concepts it contain (annotation) and what images are most similar to (retrieval). This thesis contributes to scene classification by proposing a weighting approach, named keypoints density-based weighting method (KDW), to control the fusion of colour information and bag of visual words on spatial pyramid layout in a unified framework. Different configurations of BOW, integrated visual vocabularies and multiple image descriptors are investigated and analyzed. The proposed approaches are extensively evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories using 10-fold cross validation. The second contribution in this thesis, the scene annotation task, is to explore whether the integrated visual vocabularies generated for scene classification can be used to model the local semantic information of natural scenes. In this direction, image annotation is considered as a classification problem where images are partitioned into 10x10 fixed grid and each block, represented by BOW and different image descriptors, is classified into one of predefined semantic classes. An image is then represented by counting the percentage of every semantic concept detected in the image. Experimental results on 6 scene categories demonstrate the effectiveness of the proposed approach. Finally, this thesis further explores, with an extensive experimental work, the use of different configurations of the BOW for natural scene retrieval.
8

Detekce Akustického Prostředí z Řeči / Acoustic Scene Classification from Speech

Dobrotka, Matúš January 2018 (has links)
The topic of this thesis is an audio recording classification with 15 different acoustic scene classes that represent common scenes and places where people are situated on a regular basis. The thesis describes 2 approaches based on GMM and i-vectors and a fusion of the both approaches. The score of the best GMM system which was evaluated on the evaluation dataset of the DCASE Challenge is 60.4%. The best i-vector system's score is 68.4%. The fusion of the GMM system and the best i-vector system achieves score of 69.3%, which would lead to the 20th place in the all systems ranking of the DCASE 2017 Challenge (among 98 submitted systems from all over the world).
9

Natural scene classification, annotation and retrieval. Developing different approaches for semantic scene modelling based on Bag of Visual Words.

Alqasrawi, Yousef T. N. January 2012 (has links)
With the availability of inexpensive hardware and software, digital imaging has become an important medium of communication in our daily lives. A huge amount of digital images are being collected and become available through the internet and stored in various fields such as personal image collections, medical imaging, digital arts etc. Therefore, it is important to make sure that images are stored, searched and accessed in an efficient manner. The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Based on this promising model, this thesis investigates three main problems: natural scene classification, annotation and retrieval. Given an image, the task is to design a system that can determine to which class that image belongs to (classification), what semantic concepts it contain (annotation) and what images are most similar to (retrieval). This thesis contributes to scene classification by proposing a weighting approach, named keypoints density-based weighting method (KDW), to control the fusion of colour information and bag of visual words on spatial pyramid layout in a unified framework. Different configurations of BOW, integrated visual vocabularies and multiple image descriptors are investigated and analyzed. The proposed approaches are extensively evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories using 10-fold cross validation. The second contribution in this thesis, the scene annotation task, is to explore whether the integrated visual vocabularies generated for scene classification can be used to model the local semantic information of natural scenes. In this direction, image annotation is considered as a classification problem where images are partitioned into 10x10 fixed grid and each block, represented by BOW and different image descriptors, is classified into one of predefined semantic classes. An image is then represented by counting the percentage of every semantic concept detected in the image. Experimental results on 6 scene categories demonstrate the effectiveness of the proposed approach. Finally, this thesis further explores, with an extensive experimental work, the use of different configurations of the BOW for natural scene retrieval. / Applied Science University in Jordan
10

Effective and efficient visual description based on local binary patterns and gradient distribution for object recognition

Zhu, Chao 03 April 2012 (has links)
Cette thèse est consacrée au problème de la reconnaissance visuelle des objets basé sur l'ordinateur, qui est devenue un sujet de recherche très populaire et important ces dernières années grâce à ses nombreuses applications comme l'indexation et la recherche d'image et de vidéo , le contrôle d'accès de sécurité, la surveillance vidéo, etc. Malgré beaucoup d'efforts et de progrès qui ont été fait pendant les dernières années, il reste un problème ouvert et est encore considéré comme l'un des problèmes les plus difficiles dans la communauté de vision par ordinateur, principalement en raison des similarités entre les classes et des variations intra-classe comme occlusion, clutter de fond, les changements de point de vue, pose, l'échelle et l'éclairage. Les approches populaires d'aujourd'hui pour la reconnaissance des objets sont basé sur les descripteurs et les classiffieurs, ce qui généralement extrait des descripteurs visuelles dans les images et les vidéos d'abord, et puis effectue la classification en utilisant des algorithmes d'apprentissage automatique sur la base des caractéristiques extraites. Ainsi, il est important de concevoir une bonne description visuelle, qui devrait être à la fois discriminatoire et efficace à calcul, tout en possédant certaines propriétés de robustesse contre les variations mentionnées précédemment. Dans ce contexte, l’objectif de cette thèse est de proposer des contributions novatrices pour la tâche de la reconnaissance visuelle des objets, en particulier de présenter plusieurs nouveaux descripteurs visuelles qui représentent effectivement et efficacement le contenu visuel d’image et de vidéo pour la reconnaissance des objets. Les descripteurs proposés ont l'intention de capturer l'information visuelle sous aspects différents. Tout d'abord, nous proposons six caractéristiques LBP couleurs de multi-échelle pour traiter les défauts principaux du LBP original, c'est-à-dire, le déffcit d'information de couleur et la sensibilité aux variations des conditions d'éclairage non-monotoniques. En étendant le LBP original à la forme de multi-échelle dans les différents espaces de couleur, les caractéristiques proposées non seulement ont plus de puissance discriminante par l'obtention de plus d'information locale, mais possèdent également certaines propriétés d'invariance aux différentes variations des conditions d’éclairage. En plus, leurs performances sont encore améliorées en appliquant une stratégie de l'image division grossière à fine pour calculer les caractéristiques proposées dans les blocs d'image afin de coder l'information spatiale des structures de texture. Les caractéristiques proposées capturent la distribution mondiale de l’information de texture dans les images. Deuxièmement, nous proposons une nouvelle méthode pour réduire la dimensionnalité du LBP appelée la combinaison orthogonale de LBP (OC-LBP). Elle est adoptée pour construire un nouveau descripteur local basé sur la distribution en suivant une manière similaire à SIFT. Notre objectif est de construire un descripteur local plus efficace en remplaçant l'information de gradient coûteux par des patterns de texture locales dans le régime du SIFT. Comme l'extension de notre première contribution, nous étendons également le descripteur OC-LBP aux différents espaces de couleur et proposons six descripteurs OC-LBP couleurs pour améliorer la puissance discriminante et la propriété d'invariance photométrique du descripteur basé sur l'intensité. Les descripteurs proposés capturent la distribution locale de l’information de texture dans les images. Troisièmement, nous introduisons DAISY, un nouveau descripteur local rapide basé sur la distribution de gradient, dans le domaine de la reconnaissance visuelle des objets. [...] / This thesis is dedicated to the problem of machine-based visual object recognition, which has become a very popular and important research topic in recent years because of its wide range of applications such as image/video indexing and retrieval, security access control, video monitoring, etc. Despite a lot of e orts and progress that have been made during the past years, it remains an open problem and is still considered as one of the most challenging problems in computer vision community, mainly due to inter-class similarities and intra-class variations like occlusion, background clutter, changes in viewpoint, pose, scale and illumination. The popular approaches for object recognition nowadays are feature & classifier based, which typically extract visual features from images/videos at first, and then perform the classification using certain machine learning algorithms based on the extracted features. Thus it is important to design good visual description, which should be both discriminative and computationally efficient, while possessing some properties of robustness against the previously mentioned variations. In this context, the objective of this thesis is to propose some innovative contributions for the task of visual object recognition, in particular to present several new visual features / descriptors which effectively and efficiently represent the visual content of images/videos for object recognition. The proposed features / descriptors intend to capture the visual information from different aspects. Firstly, we propose six multi-scale color local binary pattern (LBP) features to deal with the main shortcomings of the original LBP, namely deficiency of color information and sensitivity to non-monotonic lighting condition changes. By extending the original LBP to multi-scale form in different color spaces, the proposed features not only have more discriminative power by obtaining more local information, but also possess certain invariance properties to different lighting condition changes. In addition, their performances are further improved by applying a coarse-to-fine image division strategy for calculating the proposed features within image blocks in order to encode spatial information of texture structures. The proposed features capture global distribution of texture information in images. Secondly, we propose a new dimensionality reduction method for LBP called the orthogonal combination of local binary patterns (OC-LBP), and adopt it to construct a new distribution-based local descriptor by following a way similar to SIFT.Our goal is to build a more efficient local descriptor by replacing the costly gradient information with local texture patterns in the SIFT scheme. As the extension of our first contribution, we also extend the OC-LBP descriptor to different color spaces and propose six color OC-LBP descriptors to enhance the discriminative power and the photometric invariance property of the intensity-based descriptor. The proposed descriptors capture local distribution of texture information in images. Thirdly, we introduce DAISY, a new fast local descriptor based on gradient distribution, to the domain of visual object recognition.

Page generated in 0.1068 seconds