• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 3
  • 2
  • 2
  • Tagged with
  • 22
  • 22
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

A deep learning model for scene recognition

Meng, Zhaoxin January 2019 (has links)
Scene recognition is a hot research topic in the field of image recognition. It is necessary that we focus on the research on scene recognition, because it is helpful to the scene understanding topic, and can provide important contextual information for object recognition. The traditional approaches for scene recognition still have a lot of shortcomings. In these years, the deep learning method, which uses convolutional neural network, has got state-of-the-art results in this area. This thesis constructs a model based on multi-layer feature extraction of CNN and transfer learning for scene recognition tasks. Because scene images often contain multiple objects, there may be more useful local semantic information in the convolutional layers of the network, which may be lost in the full connected layers. Therefore, this paper improved the traditional architecture of CNN, adopted the existing improvement which enhanced the convolution layer information, and extracted it using Fisher Vector. Then this thesis introduced the idea of transfer learning, and tried to introduce the knowledge of two different fields, which are scene and object. We combined the output of these two networks to achieve better results. Finally, this thesis implemented the method using Python and PyTorch. This thesis applied the method to two famous scene datasets. the UIUC-Sports and Scene-15 datasets. Compared with traditional CNN AlexNet architecture, we improve the result from 81% to 93% in UIUC-Sports, and from 79% to 91% in Scene- 15. It shows that our method has good performance on scene recognition tasks.
12

Montagem e utilização de ambientes virtuais agrícolas em um sistema de multiprojeção imersivo a partir de cenas naturais / Generation and use of virtual agricultural environments in an immersive multiprojection system from natural scenes

Oliveira, Claiton de 05 October 2012 (has links)
A geração de ambientes virtuais de áreas urbanas ou cenas naturais coloca um grande número de problemas no âmbito da Computação Gráfica, dado que a quantidade de informação necessária para a criação de modelos realistas é dependente da dimensão e complexidade da área a modelar. A construção de um grande número de modelos de objetos naturais de forma detalhada é extremamente trabalhosa. Enquanto os modelos de estruturas artificiais, tais como máquinas ou edifícios podem ser obtidos a partir de fontes CAD, o mesmo não ocorre as plantas e outros fenômenos naturais. Embora muitos ambientes virtuais sejam criados por modelagem manual e individual de cada um dos seus componentes, os processos automáticos e semi-automáticos de reconstrução de ambientes naturais 3D provaram que podem ser muito mais eficientes, reduzindo a duração, o custo e a alocação de recursos humanos. A integração entre diferentes tecnologias e ferramentas que possibilitem a identificação de elementos em um cenário agrícola, modelagem de objetos 3D e a posterior apresentação e utilização do ambiente virtual em um sistema do tipo CAVE não é uma tarefa trivial. Dessa forma, o objetivo desta pesquisa é desenvolver uma metodologia de montagem automática de ambientes virtuais agrícolas baseada na extração de objetos de cenas naturais reais a partir de imagens de vídeo para utilização no sistema de multiprojeção imersivo do Laboratório Multiusuário de Visualização 3D Imersiva de São Carlos (LMVI). A partir de um modelo de dados 3D projetado em um sistema que oferece um alto grau de imersão e interação como o LMVI, pode-se fazer comparações com outros modelos de dados ou com o mesmo modelo em épocas diferentes. Através da comparação entre modelos é possível identificar alterações que ocorreram no ambiente ao longo do tempo (tanto naturais como causadas pelo homem) auxiliando na tomada de decisão em processos agrícolas. / The generation of virtual environments for urban or natural scenes poses a number of problems within the Computer Graphics, since the amount of information needed to create realistic models is dependent on the size and complexity of the area to be modeled.The construction of a large number of natural object models in detail is extremely laborious. While the models of artificial structures, such as machines or buildings can be obtained from CAD sources, the same is not true plants and other natural phenomena. Although many virtual environments are created by individual and manual modeling of each of their components, the processes of automatic and semi-automatic 3D reconstruction of natural environments have proved to be much more efficient, reducing duration, cost and allocation of human resources.The integration between different technologies and tools that enable the identication of elements in an agricultural setting, modeling of 3D objects and the subsequent presentation and use of virtual environment in a CAVE-like system is not a trivial task. Thus, the objective of this research is to develop a methodology for automatic mounting of agricultural virtual environments based on the extraction of objects of natural scenes from real video images for use in the immersive multiprojection system of the Multiuser Laboratory of 3D Immersive Visualization of Sao Carlos (MLIV). From a 3d data model projected in a system that offers a high degree of immersion and interaction as MLIV, one can make comparisons with other data models or with the same model at different periods. Through the comparison between models is possible to identify changes that occurred in the environment over time (both natural and manmade) assisting the decision making in agricultural processes.
13

Montagem e utilização de ambientes virtuais agrícolas em um sistema de multiprojeção imersivo a partir de cenas naturais / Generation and use of virtual agricultural environments in an immersive multiprojection system from natural scenes

Claiton de Oliveira 05 October 2012 (has links)
A geração de ambientes virtuais de áreas urbanas ou cenas naturais coloca um grande número de problemas no âmbito da Computação Gráfica, dado que a quantidade de informação necessária para a criação de modelos realistas é dependente da dimensão e complexidade da área a modelar. A construção de um grande número de modelos de objetos naturais de forma detalhada é extremamente trabalhosa. Enquanto os modelos de estruturas artificiais, tais como máquinas ou edifícios podem ser obtidos a partir de fontes CAD, o mesmo não ocorre as plantas e outros fenômenos naturais. Embora muitos ambientes virtuais sejam criados por modelagem manual e individual de cada um dos seus componentes, os processos automáticos e semi-automáticos de reconstrução de ambientes naturais 3D provaram que podem ser muito mais eficientes, reduzindo a duração, o custo e a alocação de recursos humanos. A integração entre diferentes tecnologias e ferramentas que possibilitem a identificação de elementos em um cenário agrícola, modelagem de objetos 3D e a posterior apresentação e utilização do ambiente virtual em um sistema do tipo CAVE não é uma tarefa trivial. Dessa forma, o objetivo desta pesquisa é desenvolver uma metodologia de montagem automática de ambientes virtuais agrícolas baseada na extração de objetos de cenas naturais reais a partir de imagens de vídeo para utilização no sistema de multiprojeção imersivo do Laboratório Multiusuário de Visualização 3D Imersiva de São Carlos (LMVI). A partir de um modelo de dados 3D projetado em um sistema que oferece um alto grau de imersão e interação como o LMVI, pode-se fazer comparações com outros modelos de dados ou com o mesmo modelo em épocas diferentes. Através da comparação entre modelos é possível identificar alterações que ocorreram no ambiente ao longo do tempo (tanto naturais como causadas pelo homem) auxiliando na tomada de decisão em processos agrícolas. / The generation of virtual environments for urban or natural scenes poses a number of problems within the Computer Graphics, since the amount of information needed to create realistic models is dependent on the size and complexity of the area to be modeled.The construction of a large number of natural object models in detail is extremely laborious. While the models of artificial structures, such as machines or buildings can be obtained from CAD sources, the same is not true plants and other natural phenomena. Although many virtual environments are created by individual and manual modeling of each of their components, the processes of automatic and semi-automatic 3D reconstruction of natural environments have proved to be much more efficient, reducing duration, cost and allocation of human resources.The integration between different technologies and tools that enable the identication of elements in an agricultural setting, modeling of 3D objects and the subsequent presentation and use of virtual environment in a CAVE-like system is not a trivial task. Thus, the objective of this research is to develop a methodology for automatic mounting of agricultural virtual environments based on the extraction of objects of natural scenes from real video images for use in the immersive multiprojection system of the Multiuser Laboratory of 3D Immersive Visualization of Sao Carlos (MLIV). From a 3d data model projected in a system that offers a high degree of immersion and interaction as MLIV, one can make comparisons with other data models or with the same model at different periods. Through the comparison between models is possible to identify changes that occurred in the environment over time (both natural and manmade) assisting the decision making in agricultural processes.
14

Predictive eyes precede retrieval : visual recognition as hypothesis testing

Holm, Linus January 2007 (has links)
Does visual recognition entail verifying an idea about what is perceived? This question was addressed in the three studies of this thesis. The main hypothesis underlying the investigation was that visual recognition is an active process involving hypothesis testing. Recognition of faces (Study 1), scenes (Study 2) and objects (Study 3) was investigated using eye movement registration as a window on the recognition process. In Study 1, a functional relationship between eye movements and face recognition was established. Restricting the eye movements reduced recognition performance. In addition, perceptual reinstatement as indicated by eye movement consistency across study and test was related to recollective experience at test. Specifically, explicit recollection was related to higher eye movement consistency than familiarity-based recognition and false rejections (Studies 1-2). Furthermore, valid expectations about a forthcoming stimulus scene produced eye movements which were more similar to those of an earlier study episode, compared to invalid expectations (Study 2). In Study 3 participants recognized fragmented objects embedded in nonsense fragments. Around 8 seconds prior to explicit recognition, participants began to fixate the object region rather than a similar control region in the stimulus pictures. Before participants’ indicated awareness of the object, they fixated it with an average of 9 consecutive fixations. Hence, participants were looking at the object as if they had recognized it before they became aware of its identity. Furthermore, prior object information affected eye movement sampling of the stimulus, suggesting that semantic memory was involved in guiding the eyes during object recognition even before the participants were aware of its presence. Collectively, the studies support the view that gaze control is instrumental to visual recognition performance and that visual recognition is an interactive process between memory representation and information sampling.
15

Scene Recognition and Collision Avoidance System for Robotic Combine Harvesters Based on Deep Learning / 深層学習に基づくロボットコンバインハーベスタのためのシーン認識および衝突回避システム

Li, Yang 23 September 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(農学) / 甲第22784号 / 農博第2427号 / 新制||農||1081(附属図書館) / 学位論文||R2||N5304(農学部図書室) / 京都大学大学院農学研究科地域環境科学専攻 / (主査)教授 飯田 訓久, 教授 近藤 直, 教授 中嶋 洋 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
16

Learning Semantic Features For Visual Recognition

Liu, Jingen 01 January 2009 (has links)
Visual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into visual words (i.e., mid-level features) based on their appearance similarity using clustering, has been widely and successfully explored. The advantages of this representation are: no explicit detection of objects or object parts and their tracking are required; the representation is somewhat tolerant to within-class deformations, and it is efficient for matching. However, the performance of the BoVW is sensitive to the size of the visual vocabulary. Therefore, computationally expensive cross-validation is needed to find the appropriate quantization granularity. This limitation is partially due to the fact that the visual words are not semantically meaningful. This limits the effectiveness and compactness of the representation. To overcome these shortcomings, in this thesis we present principled approach to learn a semantic vocabulary (i.e. high-level features) from a large amount of visual words (mid-level features). In this context, the thesis makes two major contributions. First, we have developed an algorithm to discover a compact yet discriminative semantic vocabulary. This vocabulary is obtained by grouping the visual-words based on their distribution in videos (images) into visual-word clusters. The mutual information (MI) be- tween the clusters and the videos (images) depicts the discriminative power of the semantic vocabulary, while the MI between visual-words and visual-word clusters measures the compactness of the vocabulary. We apply the information bottleneck (IB) algorithm to find the optimal number of visual-word clusters by finding the good tradeoff between compactness and discriminative power. We tested our proposed approach on the state-of-the-art KTH dataset, and obtained average accuracy of 94.2%. However, this approach performs one-side clustering, because only visual words are clustered regardless of which video they appear in. In order to leverage the co-occurrence of visual words and images, we have developed the co-clustering algorithm to simultaneously group the visual words and images. We tested our approach on the publicly available fifteen scene dataset and have obtained about 4% increase in the average accuracy compared to the one side clustering approaches. Second, instead of grouping the mid-level features, we first embed the features into a low-dimensional semantic space by manifold learning, and then perform the clustering. We apply Diffusion Maps (DM) to capture the local geometric structure of the mid-level feature space. The DM embedding is able to preserve the explicitly defined diffusion distance, which reflects the semantic similarity between any two features. Furthermore, the DM provides multi-scale analysis capability by adjusting the time steps in the Markov transition matrix. The experiments on KTH dataset show that DM can perform much better (about 3% to 6% improvement in average accuracy) than other manifold learning approaches and IB method. Above methods use only single type of features. In order to combine multiple heterogeneous features for visual recognition, we further propose the Fielder Embedding to capture the complicated semantic relationships between all entities (i.e., videos, images,heterogeneous features). The discovered relationships are then employed to further increase the recognition rate. We tested our approach on Weizmann dataset, and achieved about 17% 21% improvements in the average accuracy.
17

AN INVESTIGATION OF SPATIAL REFERENCE FRAMES AND THE CHARACTERISTICS OF BODY-BASED INFORMATION FOR SPATIAL UPDATING

Teeter, Christopher J. 10 1900 (has links)
<p>Successful navigation requires an accurate mental spatial representation of the environment that can be updated during movement. Experiments with animals and humans have demonstrated the existence of two forms of spatial representation: egocentric (observer-centered) and allocentric (environment-centered). Unfortunately, specifically how humans use these two systems is not well understood. The current dissertation was focused on providing evidence differentiating human use of egocentric and allocentric spatial reference frames, specifically examining the characteristics and contributions from body-based sources. Two empirical chapters are presented that include experiments involving two common spatial tasks. In Chapter 2, updating of feature relations within a room-sized environment was examined by having observers provide directional judgments to learned features with respect to an imagined orientation that was either congruent or incongruent with their physical orientation. The information available for updating the physical orientation was manipulated across experiments. Performance differences between congruent and incongruent conditions demonstrated the reliance on egocentric representations for updating, and differentiated body- and knowledge-based components of the egocentric updating process. The specificity of the body-based component was examined in Chapter 3 by having observers detect changes made to a tabletop spatial scene following a viewpoint shift resulting from their movement, scene rotation or both. The relation between the extent of observer movement and the magnitude of the experienced viewpoint shift was manipulated. Change detection performance was best when the extent of observer movement most closely matched the viewpoint shift, and declined as the match declined. Thus, body-based cues contributed specific information for updating self-to-feature relations that facilitated scene recognition. Throughout the course of the research program it has become clear that humans rely on egocentric representations to complete these tasks, and sensory and motor modalities involved in self-motion are integrated for updating spatial relations of novel environments.</p> / Doctor of Philosophy (PhD)
18

Reconnaissance de scènes multimodale embarquée / Embedded multimodal scene recognition

Blachon, David 29 February 2016 (has links)
Contexte : Cette thèse se déroule dans les contextes de l'intelligence ambiante et de la reconnaissance de scène (sur mobile). Historiquement, le projet vient de l'entreprise ST-Ericsson. Il émane d'un besoin de développer et intégrer un "serveur de contexte" sur smartphone capable d'estimer et de fournir des informations de contexte pour les applications tierces qui le demandent. Un exemple d'utilisation consiste en une réunion de travail où le téléphone sonne~; grâce à la reconnaissance de la scène, le téléphone peut automatiquement réagir et adapter son comportement, par exemple en activant le mode vibreur pour ne pas déranger.Les principaux problèmes de la thèse sont les suivants : d'abord, proposer une définition de ce qu'est une scène et des exemples de scènes pertinents pour l'application industrielle ; ensuite, faire l'acquisition d'un corpus de données à exploiter par des approches d'apprentissage automatique~; enfin, proposer des solutions algorithmiques au problème de la reconnaissance de scène.Collecte de données : Aucune des bases de données existantes ne remplit les critères fixés (longs enregistrements continus, composés de plusieurs sources de données synchronisées dont l'audio, avec des annotations pertinentes).Par conséquent, j'ai développé une application Android pour la collecte de données. L'application est appelée RecordMe et a été testé avec succès sur plus de 10 appareils. L'application a été utilisée pour 2 campagnes différentes, incluant la collecte de scènes. Cela se traduit par plus de 500 heures enregistrées par plus de 25 bénévoles, répartis principalement dans la région de Grenoble, mais aussi à l'étranger (Dublin, Singapour, Budapest). Pour faire face au problème de protection de la vie privée et de sécurité des données, des mesures ont été mises en place dans le protocole et l'application de collecte. Par exemple, le son n'est pas sauvegardé, mes des coefficients MFCCs sont enregistrés.Définition de scène : L'étude des travaux existants liés à la tâche de reconnaissance de scène, et l'analyse des annotations fournies par les bénévoles lors de la collecte de données, ont permis de proposer une définition d'une scène. Elle est définie comme la généralisation d'une situation, composée d'un lieu et une action effectuée par une seule personne (le propriétaire du smartphone). Des exemples de scènes incluent les moyens de transport, la réunion de travail, ou le déplacement à pied dans la rue. La notion de composition permet de décrire la scène avec plusieurs types d'informations. Cependant, la définition est encore trop générique, et elle pourrait être complétée par des informations additionnelles, intégrée à la définition comme de nouveaux éléments de la composition.Algorithmique : J'ai réalisé plusieurs expériences impliquant des techniques d'apprentissage automatique supervisées et non non-supervisées. La partie supervisée consiste en de la classification. La méthode est commune~: trouver des descripteurs des données pertinents grâce à l'utilisation d'une méthode de sélection d'attribut ; puis, entraîner et tester plusieurs classifieurs (arbres de décisions et forêt d'arbres décisionnels ; GMM ; HMM, et DNN). Également, j'ai proposé un système à 2 étages composé de classifieurs formés pour identifier les concepts intermédiaires et dont les prédictions sont fusionnées afin d'estimer la scène la plus probable. Les expérimentations non-supervisées visent à extraire des informations à partir des données. Ainsi, j'ai appliqué un algorithme de regroupement hiérarchique ascendant, basé sur l'algorithme EM, sur les données d'accélération et acoustiques considérées séparément et ensemble. L'un des résultats est la distinction des données d'accélération en groupes basés sur la quantité d'agitation. / Context: This PhD takes place in the contexts of Ambient Intelligence and (Mobile) Context/Scene Awareness. Historically, the project comes from the company ST-Ericsson. The project was depicted as a need to develop and embed a “context server” on the smartphone that would get and provide context information to applications that would require it. One use case was given for illustration: when someone is involved in a meeting and receives a call, then thanks to the understanding of the current scene (meet at work), the smartphone is able to automatically act and, in this case, switch to vibrate mode in order not to disturb the meeting. The main problems consist of i) proposing a definition of what is a scene and what examples of scenes would suit the use case, ii) acquiring a corpus of data to be exploited with machine learning based approaches, and iii) propose algorithmic solutions to the problem of scene recognition.Data collection: After a review of existing databases, it appeared that none fitted the criteria I fixed (long continuous records, multi-sources synchronized records necessarily including audio, relevant labels). Hence, I developed an Android application for collecting data. The application is called RecordMe and has been successfully tested on 10+ devices, running Android 2.3 and 4.0 OS versions. It has been used for 3 different campaigns including the one for scenes. This results in 500+ hours recorded, 25+ volunteers were involved, mostly in Grenoble area but abroad also (Dublin, Singapore, Budapest). The application and the collection protocol both include features for protecting volunteers privacy: for instance, raw audio is not saved, instead MFCCs are saved; sensitive strings (GPS coordinates, device ids) are hashed on the phone.Scene definition: The study of existing works related to the task of scene recognition, along with the analysis of the annotations provided by the volunteers during the data collection, allowed me to propose a definition of a scene. It is defined as a generalisation of a situation, composed of a place and an action performed by one person (the smartphone owner). Examples of scenes include taking a transportation, being involved in a work meeting, walking in the street. The composition allows to get different kinds of information to provide on the current scene. However, the definition is still too generic, and I think that it might be completed with additionnal information, integrated as new elements of the composition.Algorithmics: I have performed experiments involving machine learning techniques, both supervised and unsupervised. The supervised one is about classification. The method is quite standard: find relevant descriptors of the data through the use of an attribute selection method. Then train and test several classifiers (in my case, there were J48 and Random Forest trees ; GMM ; HMM ; and DNN). Also, I have tried a 2-stage system composed of a first step of classifiers trained to identify intermediate concepts and whose predictions are merged in order to estimate the most likely scene. The unsupervised part of the work aimed at extracting information from the data, in an unsupervised way. For this purpose, I applied a bottom-up hierarchical clustering, based on the EM algorithm on acceleration and audio data, taken separately and together. One of the results is the distinction of acceleration into groups based on the amount of agitation.
19

A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments

Casserfelt, Karl January 2018 (has links)
The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame.
20

CHANGE DETECTION OF A SCENE FOLLOWING A VIEWPOINT CHANGE: MECHANISMS FOR THE REDUCED PERFORMANCE COST WHEN THE VIEWPOINT CHANGE IS CAUSED BY VIEWER LOCOMOTION

Comishen, Michael A. 10 1900 (has links)
<p>When an observer detects changes in a scene from a viewpoint that is different from the learned viewpoint, viewpoint change caused by observer’s locomotion would lead to better recognition performance compared to a situation where the viewpoint change is caused by equivalent movement of the scene. While such benefit of observer locomotion could be caused by spatial updating through body-based information (Simons and Wang 1998), or knowledge of change of reference direction gained through locomotion (Mou et al, 2009). The effect of such reference direction information have been demonstrated through the effect of a visual cue (e.g., a chopstick) presented during the testing phase indicating the original learning viewpoint (Mou et al, 2009).</p> <p>In the current study, we re-examined the mechanisms of such benefit of observer locomotion. Six experiments were performed using a similar change detection paradigm. Experiment 1 & 2 adopted the design as that in Mou et al. (2009). The results were inconsistent with the results from Mou et al (2009) in that even with the visual indicator, the performance (accuracy and response time) in the table rotation condition was still significantly worse than that in the observer locomotion condition. In Experiments 3-5, we compared performance in the normal walking condition with conditions where the body-based information may not be reliable (disorientation or walking over a long path). The results again showed a lack of benefit with the visual indicator. Experiment 6 introduced a more salient and intrinsic reference direction: coherent object orientations. Unlike the previous experiments, performance in the scene rotation condition was similar to that in the observer locomotion condition.</p> <p>Overall we showed that the body-based information in observer locomotion may be the most prominent information. The knowledge of the reference direction could be useful but might only be effective in limited scenarios such as a scene with a dominant orientation.</p> / Master of Science (MSc)

Page generated in 0.0821 seconds