11 |
Visuo-Haptic recognition of daily-life objects : a contribution to the data scarcity problem / Reconnaissance visio-haptique des objets de la vie quotidienne : à partir de peu de données d'entraînementAbderrahmane, Zineb 29 November 2018 (has links)
Il est important pour les robots de pouvoir reconnaître les objets rencontrés dans la vie quotidienne afin d’assurer leur autonomie. De nos jours, les robots sont équipés de capteurs sophistiqués permettant d’imiter le sens humain du toucher. C’est ce qui permet aux robots interagissant avec les objets de percevoir les propriétés (telles la texture, la rigidité et la matière) nécessaires pour leur reconnaissance. Dans cette thèse, notre but est d’exploiter les données haptiques issues de l’interaction robot-objet afin de reconnaître les objets de la vie quotidienne, et cela en utilisant les algorithmes d’apprentissage automatique. Le problème qui se pose est la difficulté de collecter suffisamment de données haptiques afin d’entraîner les algorithmes d’apprentissage supervisé sur tous les objets que le robot doit reconnaître. En effet, les objets de la vie quotidienne sont nombreux et l’interaction physique entre le robot et chaque objet pour la collection des données prend beaucoup de temps et d’efforts. Pour traiter ce problème, nous développons un système de reconnaissance haptique permettant de reconnaître des objets à partir d'aucune, de une seule, ou de plusieurs données d’entraînement. Enfin, nous intégrons la vision afin d’améliorer la reconnaissance d'objets lorsque le robot est équipé de caméras. / Recognizing surrounding objects is an important skill for the autonomy of robots performing in daily-life. Nowadays robots are equipped with sophisticated sensors imitating the human sense of touch. This allows the recognition of an object based on information ensuing from robot-object physical interaction. Such information can include the object texture, compliance and material. In this thesis, we exploit haptic data to perform haptic recognition of daily life objects using machine learning techniques. The main challenge faced in our work is the difficulty of collecting a fair amount of haptic training data for all daily-life objects. This is due to the continuously growing number of objects and to the effort and time needed by the robot to physically interact with each object for data collection. We solve this problem by developing a haptic recognition framework capable of performing Zero-shot, One-shot and Multi-shot Learning. We also extend our framework by integrating vision to enhance the robot’s recognition performance, whenever such sense is available.
|
12 |
Attribute learning for image/video understandingFu, Yanwei January 2015 (has links)
For the past decade computer vision research has achieved increasing success in visual recognition including object detection and video classification. Nevertheless, these achievements still cannot meet the urgent needs of image and video understanding. The recently rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. In particular, these types of media data usually contain very complex social activities of a group of people (e.g. YouTube video of a wedding reception) and are captured by consumer devices with poor visual quality. Thus it is extremely challenging to automatically understand such a high number of complex image and video categories, especially when these categories have never been seen before. One way to understand categories with no or few examples is by transfer learning which transfers knowledge across related domains, tasks, or distributions. In particular, recently lifelong learning has become popular which aims at transferring information to tasks without any observed data. In computer vision, transfer learning often takes the form of attribute learning. The key underpinning idea of attribute learning is to exploit transfer learning via an intermediatelevel semantic representations – attributes. The semantic attributes are most commonly used as a semantically meaningful bridge between low feature data and higher level class concepts, since they can be used both descriptively (e.g., ’has legs’) and discriminatively (e.g., ’cats have it but dogs do not’). Previous works propose many different attribute learning models for image and video understanding. However, there are several intrinsic limitations and problems that exist in previous attribute learning work. Such limitations discussed in this thesis include limitations of user-defined attributes, projection domain-shift problems, prototype sparsity problems, inability to combine multiple semantic representations and noisy annotations of relative attributes. To tackle these limitations, this thesis explores attribute learning on image and video understanding from the following three aspects. Firstly to break the limitations of user-defined attributes, a framework for learning latent attributes is present for automatic classification and annotation of unstructured group social activity in videos, which enables the tasks of attribute learning for understanding complex multimedia data with sparse and incomplete labels. We investigate the learning of latent attributes for content-based understanding, which aims to model and predict classes and tags relevant to objects, sounds and events – anything likely to be used by humans to describe or search for media. Secondly, we propose the framework of transductive multi-view embedding hypergraph label propagation and solve three inherent limitations of most previous attribute learning work, i.e., the projection domain shift problems, the prototype sparsity problems and the inability to combine multiple semantic representations. We explore the manifold structure of the data distributions of different views projected onto the same embedding space via label propagation on a graph. Thirdly a novel framework for robust learning is presented to effectively learn relative attributes from the extremely noisy and sparse annotations. Relative attributes are increasingly learned from pairwise comparisons collected via crowdsourcing tools which are more economic and scalable than the conventional laboratory based data annotation. However, a major challenge for taking a crowdsourcing strategy is the detection and pruning of outliers. We thus propose a principled way to identify annotation outliers by formulating the relative attribute prediction task as a unified robust learning to rank problem, tackling both the outlier detection and relative attribute prediction tasks jointly. In summary, this thesis studies and solves the key challenges and limitations of attribute learning in image/video understanding. We show the benefits of solving these challenges and limitations in our approach which thus achieves better performance than previous methods.
|
13 |
The Use of Stereoscopic Cues in the Perception of Noise Masked Images of Natural Objectsde la Rosa, Stephan 31 July 2008 (has links)
When seen through a stereoscope, a Gabor pattern (a Gaussian enveloped sinusoid) that is masked by visual noise is more readily detectable when it appears in front of or behind the noise than when it is embedded in the noise itself. The enhanced visibility brought about by stereo cues is referred to as binocular unmasking. In this work, we investigated whether binocular unmasking may also occur with visual objects more complex than simple Gabor patterns, and with tasks more demanding than detection. Specifically, we examined the effects of binocular unmasking in the detection, categorization, and identification of noise masked images of natural objects. We observed the occurrence of binocular unmasking in all three tasks. However, the size of this effect was greater for detection performance than for categorization or identification performance; the latter two benefited to the same extent by the availability of stereoscopic cues.
We argue that these results suggest that low level stereoscopic depth cues may play a helpful role, not only in simple detection tasks with psychophysical stimuli, but also in the perception of complex stimuli depicting natural objects.
|
14 |
The Use of Stereoscopic Cues in the Perception of Noise Masked Images of Natural Objectsde la Rosa, Stephan 31 July 2008 (has links)
When seen through a stereoscope, a Gabor pattern (a Gaussian enveloped sinusoid) that is masked by visual noise is more readily detectable when it appears in front of or behind the noise than when it is embedded in the noise itself. The enhanced visibility brought about by stereo cues is referred to as binocular unmasking. In this work, we investigated whether binocular unmasking may also occur with visual objects more complex than simple Gabor patterns, and with tasks more demanding than detection. Specifically, we examined the effects of binocular unmasking in the detection, categorization, and identification of noise masked images of natural objects. We observed the occurrence of binocular unmasking in all three tasks. However, the size of this effect was greater for detection performance than for categorization or identification performance; the latter two benefited to the same extent by the availability of stereoscopic cues.
We argue that these results suggest that low level stereoscopic depth cues may play a helpful role, not only in simple detection tasks with psychophysical stimuli, but also in the perception of complex stimuli depicting natural objects.
|
15 |
Visual feature graphs and image recognition / Graphes d'attributs et reconnaissance d'imagesBehmo, Régis 15 September 2010 (has links)
La problèmatique dont nous nous occupons dans cette thèse est la classification automatique d'images bidimensionnelles, ainsi que la détection d'objets génériques dans des images. Les avancées de ce champ de recherche contribuent à l'élaboration de systèmes intelligents, tels que des robots autonomes et la création d'un web sémantique. Dans ce contexte, la conception de représentations d'images et de classificateurs appropriés constituent des problèmes ambitieux. Notre travail de recherche fournit des solutions à ces deux problèmes, que sont la représentation et la classification d'images. Afin de générer notre représentation d'image, nous extrayons des attributs visuels de l'image et construisons une structure de graphe basée sur les propriétés liées au relations de proximités entre les points d'intérêt associés. Nous montrons que certaines propriétés spectrales de ces graphes constituent de bons invariants aux classes de transformations géométriques rigides. Notre représentation d'image est basée sur ces propriétés. Les résultats expérimentaux démontrent que cette représentation constitue une amélioration par rapport à d'autres représentations similaires, mais qui n'intègrent pas les informations liées à l'organisation spatiale des points d'intérêt. Cependant, un inconvénient de cette méthode est qu'elle fait appel à une quantification (avec pertes) de l'espace des attributs visuels afin d'être combinée avec un classificateur Support Vecteur Machine (SVM) efficace. Nous résolvons ce problème en créant un nouveau classificateur, basé sur la distance au plus proche voisin, et qui permet la classification d'objets assimilés à des ensembles de points. La linéarité de ce classificateur nous permet également de faire de la détection d'objet, en plus de la classification d'images. Une autre propriété intéressante de ce classificateur est sa capacité à combiner différents types d'attributs visuels de manière optimale. Nous utilisons cette propriété pour formuler le problème de classification de graphes de manière différente. Les expériences, menées sur une grande variété de jeux de données, montrent les bénéfices quantitatifs de notre approche. / We are concerned in this thesis by the problem of automated 2D image classification and general object detection. Advances in this field of research contribute to the elaboration of intelligent systems such as, but not limited to, autonomous robots and the semantic web. In this context, designing adequate image representations and classifiers for these representations constitute challenging issues. Our work provides innovative solutions to both these problems: image representation and classification. In order to generate our image representation, we extract visual features from the image and build a graphical structure based on properties of spatial proximity between the feature points. We show that certain spectral properties of this graph constitute good invariants to rigid geometric transforms. Our representation is based on these invariant properties. Experiments show that this representation constitutes an improvement over other similar representations that do not integrate the spatial layout of visual features. However, a drawback of this method is that it requires a lossy quantisation of the visual feature space in order to be combined with a state-of-the-art support vector machine (SVM) classifier. We address this issue by designing a new classifier. This generic classifier relies on a nearest-neighbour distance to classify objects that can be assimilated to feature sets, i.e: point clouds. The linearity of this classifier allows us to perform object detection, in addition to image classification. Another interesting property is its ability to combine different types of visual features in an optimal manner. We take advantage of this property to produce a new formulation for the classification of visual feature graphs. Experiments are conducted on a wide variety of publicly available datasets to justify the benefits of our approach.
|
16 |
Learning Semantic Features For Visual RecognitionLiu, Jingen 01 January 2009 (has links)
Visual recognition (e.g., object, scene and action recognition) is an active area of research in computer vision due to its increasing number of real-world applications such as video (image) indexing and search, intelligent surveillance, human-machine interaction, robot navigation, etc. Effective modeling of the objects, scenes and actions is critical for visual recognition. Recently, bag of visual words (BoVW) representation, in which the image patches or video cuboids are quantized into visual words (i.e., mid-level features) based on their appearance similarity using clustering, has been widely and successfully explored. The advantages of this representation are: no explicit detection of objects or object parts and their tracking are required; the representation is somewhat tolerant to within-class deformations, and it is efficient for matching. However, the performance of the BoVW is sensitive to the size of the visual vocabulary. Therefore, computationally expensive cross-validation is needed to find the appropriate quantization granularity. This limitation is partially due to the fact that the visual words are not semantically meaningful. This limits the effectiveness and compactness of the representation. To overcome these shortcomings, in this thesis we present principled approach to learn a semantic vocabulary (i.e. high-level features) from a large amount of visual words (mid-level features). In this context, the thesis makes two major contributions. First, we have developed an algorithm to discover a compact yet discriminative semantic vocabulary. This vocabulary is obtained by grouping the visual-words based on their distribution in videos (images) into visual-word clusters. The mutual information (MI) be- tween the clusters and the videos (images) depicts the discriminative power of the semantic vocabulary, while the MI between visual-words and visual-word clusters measures the compactness of the vocabulary. We apply the information bottleneck (IB) algorithm to find the optimal number of visual-word clusters by finding the good tradeoff between compactness and discriminative power. We tested our proposed approach on the state-of-the-art KTH dataset, and obtained average accuracy of 94.2%. However, this approach performs one-side clustering, because only visual words are clustered regardless of which video they appear in. In order to leverage the co-occurrence of visual words and images, we have developed the co-clustering algorithm to simultaneously group the visual words and images. We tested our approach on the publicly available fifteen scene dataset and have obtained about 4% increase in the average accuracy compared to the one side clustering approaches. Second, instead of grouping the mid-level features, we first embed the features into a low-dimensional semantic space by manifold learning, and then perform the clustering. We apply Diffusion Maps (DM) to capture the local geometric structure of the mid-level feature space. The DM embedding is able to preserve the explicitly defined diffusion distance, which reflects the semantic similarity between any two features. Furthermore, the DM provides multi-scale analysis capability by adjusting the time steps in the Markov transition matrix. The experiments on KTH dataset show that DM can perform much better (about 3% to 6% improvement in average accuracy) than other manifold learning approaches and IB method. Above methods use only single type of features. In order to combine multiple heterogeneous features for visual recognition, we further propose the Fielder Embedding to capture the complicated semantic relationships between all entities (i.e., videos, images,heterogeneous features). The discovered relationships are then employed to further increase the recognition rate. We tested our approach on Weizmann dataset, and achieved about 17% 21% improvements in the average accuracy.
|
17 |
Les attributs sous-tendant la reconnaissance d'objets visuels faits de deux composantesLavoie, Marie-Audrey 12 1900 (has links)
La perception de la forme visuelle est le principal médiateur de la reconnaissance d’objets. S’il y a consensus sur le fait que la détection des contours et l’analyse de fréquences spatiales sont les fondements de la vision primaire, la hiérarchie visuelle et les étapes subséquentes du traitement de l’information impliquées dans la reconnaissance d’objets sont quant à elles encore méconnues. Les données empiriques disponibles et pertinentes concernant la nature des traits primitifs qu’utilise véritablement le système visuel humain sont rares et aucune ne semble être entièrement concluante. Dans le but de palier à ce manque de données empiriques, la présente étude vise la découverte des régions de l’image utilisées par des participants humains lors d’une tâche de reconnaissance d’objets.
La technique des bulles a permis de révéler les zones diagnostiques permettant de discriminer entre les huit cibles de l’étude. Les zones ayant un effet facilitateur et celles ayant un effet inhibiteur sur les performances humaines et celles d’un observateur idéal furent identifiées. Les participants n’ont pas employé la totalité de l’information disponible dans l’image, mais seulement une infime partie, ce sont principalement les segments de contours présentant une discontinuité (i.e. convexités, concavités, intersections) qui furent sélectionnés par ces derniers afin de reconnaitre les cibles. L’identification des objets semble reposer sur des ensembles de caractéristiques distinctives de l’objet qui lui permettent d’être différencié des autres. Les informations les plus simples et utiles ont préséance et lorsqu’elles suffisent à mener à bien la tâche, le système visuel ne semble pas appliquer de traitement plus complexe, par exemple, l’encodage de caractéristiques plus complexes ou encore de conjonctions d’attributs simples. Cela appuie la notion voulant que le contexte influence la sélection des caractéristiques sous-tendant la reconnaissance d’objets et suggère que le type d’attributs varie en fonction de leur utilité dans un contexte donné. / The main mediator of visual object recognition is shape perception. While there is a consensus that contour detection and spatial frequency analysis are the foundations of early vision, the visual hierarchy and the nature of information processing in the subsequent stages involved in object recognition, remain widely unknown. Available and relevant empirical data concerning the nature of the primitive features used by the human visual system to recognize objects are scarce and none seems to be entirely conclusive. To overcome this lack of empirical data, this study aims to determine which regions of the images are used by humans when performing an object recognition task. The Bubbles technique has revealed the diagnostic areas used by 12 adults an ideal observer, to discriminate between eight target objects. stimulus areas with a facilitatory or inhibitory effect on performance were identified.
Humans only used a small subset of the information available to recognize the targets which consisted mostly in discontinuous contour segments (i.e. convexities, concavities, intersections). Object recognition seems to rest upon contrasting sets of features which allow objects to be discriminated from one another. The simplest and most useful information seems to take precedence and it suffices to the task, the visual system does not engage in further processing involving for instance more complex features or the encoding of conjunctions of simple features. This implies that context influences the selection of features underlying human object recognition and suggests that attribute types can vary according to their utility in a given context.
|
18 |
Exploiting Competition Relationship for Robust Visual RecognitionDU, LIANG January 2015 (has links)
Leveraging task relatedness has been proven to be beneficial in many machine learning tasks. Extensive researches has been done to exploit task relatedness in various forms. A common assumption for the tasks is that they are intrinsically similar to each other. Based on this assumption, joint learning algorithms are usually implemented via some forms of information sharing. Various forms of information sharing have been proposed, such as shared hidden units of neural networks, common prior distribution in hierarchical Bayesian model, shared weak learners of a boosting classifier, distance metrics and a shared low rank structure for multiple tasks. However, another very common and important task relationship, i.e., task competition, has been largely overlooked. Task competition means that tasks are competing with each other if there are competitions or conflicts between their goals. Considering that tasks with competition relationship are universal, this dissertation is to accommodate this intuition from an algorithmic perspectives and apply the algorithms to various visual recognition problems. Focus on exploiting the task competition relationships in visual recognition, the dissertation presents three types of algorithms and applied them to different visual recognition tasks. First, hypothesis competition has been exploited in a boosting framework. The proposed algorithm CompBoost jointly model the target and auxiliary tasks with a generalized additive regression model regularized by competition constraints. This model treats the feature selection as the weak learner (\ie, base functions) selection problem, and thus provides a mechanism to improve feature filtering guided by task competition. More specifically, following a stepwise optimization scheme, we iteratively add a new weak learner that balances between the gain for the target task and the inhibition on the auxiliary ones. We call the proposed algorithm CompBoost, since it shares similar structures with the popular AdaBoost algorithm. In this dissertation, we use two test beds for evaluation of CompBoost: (1) content-independent writer identification by exploiting competing tasks of handwriting recognition, and (2) actor-independent facial expression recognition by exploiting competing tasks of face recognition. In the experiments for both applications, the approach demonstrates promising performance gains by exploiting the between-task competition relationship. Second, feature competition has been instantiated through an alternating coordinate gradient algorithm. Sharing the same feature pool, two tasks are modeled together in a joint loss framework, with feature interaction encouraged via an orthogonal regularization over feature importance vectors. Then, an alternating greedy coordinate descent learning algorithm (AGCD) is derived to estimate the model. The algorithm effectively excludes distracting features in a fine-grained level for improving face verification. In other words, the proposed algorithm does not forbid feature sharing between competing tasks in a macro level; it instead selectively inhibits distracting features while preserving discriminative ones. For evaluation, the proposed algorithm is applied to two widely tested face-aging benchmark datasets: FG-Net and MORPH. On both datasets, our algorithm achieves very promising performances and outperforms all previously reported results. These experiments, together with detailed experimental analysis, show clearly the benefit of coordinating conflicting tasks for improving visual recognition. Third, two ad-hoc feature competition algorithms have been proposed to apply to visual privacy protection problems. Visual privacy protection problem is a practical case of competition factors in real world application. Algorithms are specially designed to achieve best balance between competing factors in visual privacy protection based on different modeling frameworks. Two algorithms are developed to apply to two applications, license plate de-identification and face de-identification. / Computer and Information Science
|
19 |
Vizuální aspekty individuálního rozpoznávání u papoušků šedých / Visual aspects of individual recognition in grey parrotsPrikrylová, Katarína January 2018 (has links)
This diploma thesis is focused on research of individual recognition in African grey parrots based on visual cues from stimulus card made from photographs of familiar conspecifics, specifically on testing the significance of selected visual features. Theoretical part of the thesis deals with the general individual recognition ability followed by specifics of this ability in humans, non-human primates and birds including African grey parrot. Great attention is paid specifically to the individual recognition ability in humans, since it is the aim of this thesis to interpret the results of hypotheses testing comparatively. In order to test the hypotheses seven modifications of conspecifics' photographs were created. Experiment employed matching-to- sample as method, subjects were three African grey parrots. Findings were to a large extent in accordance with findings of analogical studies that used human subjects. Results imply, that visual information in African grey parrot is processed holistically, with structure and pigmentation of feathers on the abdominal part of the parrot having most likely the highest significance for successful individual recognition of a familiar conspecific. Key words: individual recognition, African grey parrot, comparative cognition, visual recognition
|
20 |
Rôle du contrôle cognitif dans les modulations du langage et des émotions : l'exemple de la schizophrénie et des troubles bipolaires / Role of cognitif control in the modulations of the langage and the emotions : the example of the schizophrénia and the bipolar disordersGuillery, Murielle 10 February 2017 (has links)
L’étude présentée explore les modulations du contrôle émotionnel dans les interactions du langage et des émotions, chez 23 sujets atteints de schizophrénie en état de stabilisation et 21 sujets atteints de troubles bipolaires en phase euthymique. Les interactions ont été envisagées d’une part dans le sens des émotions via le langage avec une tâche expérimentale de Stroop émotionnel conditionné, puis en contraste dans le sens du langage via les émotions avec une tâche expérimentale de décision lexicale avec des voisins orthographiques à connotation émotionnelle. Les résultats mettent en évidence une hyper‐réactivité émotionnelle positive dans les troubles bipolaires et des troubles du contrôle cognitif émotionnel dans la schizophrénie. Ces deux maladies présentent des chevauchements dans les altérations cognitives qui ne permettent pasencore de distinguer des marqueurs cognitifs. Cependant, les résultats de cette étude indiquent que les processus impliqués dans les perturbations du traitement des mots à connotation émotionnelle sont de natures différentes entre ces deux pathologies. Dès ors, le présent dispositif pourrait s’avérer utile pour différencier la schizophrénie des troubles bipolaires. / The present study explores the modulations of the emotional control in the interactions of the language and the emotions, to 23 affected subjects of schizophrenia in state of stabilization and 21 affected subjects of bipolar disorders in euthymic phase. The interactions were envisaged on one hand in the sense of the feelings via the language with an experimental taskof conditioned emotional Stroop, then in contrast in the sense of the language via the feelings with an experimental task of lexical decision with orthographic neighbors with emotional connotation. The results highlight an emotional positive hyper-reactivity in bipolar disorders and disorders of the emotional cognitive control in the schizophrenia. These two diseasespresent overlappings in the cognitive changes which do not still allow to distinguish cognitive markers. However, the results of this study indicate that the processes involved in the disturbances of the processing of the words with emotional connotation are of different natures between these two pathologies. From then on, the present study could turn out usefulto differentiate the schizophrenia of bipolar disorders.
|
Page generated in 0.0814 seconds