Spelling suggestions: "subject:"cisual words"" "subject:"4visual words""
11 |
Méthodes probabilistes basées sur les mots visuels pour la reconnaissance de lieux sémantiques par un robot mobile / Visual words based probalistic methods for semantic places recognitionDubois, Mathieu 20 February 2012 (has links)
Les êtres humains définissent naturellement leur espace quotidien en unités discrètes. Par exemple, nous sommes capables d'identifier le lieu où nous sommes (e.g. le bureau 205) et sa catégorie (i.e. un bureau), sur la base de leur seule apparence visuelle. Les travaux récents en reconnaissance de lieux sémantiques, visent à doter les robots de capacités similaires. Ces unités, appelées "lieux sémantiques", sont caractérisées par une extension spatiale et une unité fonctionnelle, ce qui distingue ce domaine des travaux habituels en cartographie. Nous présentons nos travaux dans le domaine de la reconnaissance de lieux sémantiques. Ces derniers ont plusieurs originalités par rapport à l'état de l'art. Premièrement, ils combinent la caractérisation globale d'une image, intéressante car elle permet de s'affranchir des variations locales de l'apparence des lieux, et les méthodes basées sur les mots visuels, qui reposent sur la classification non-supervisée de descripteurs locaux. Deuxièmement, et de manière intimement reliée, ils tirent parti du flux d'images fourni par le robot en utilisant des méthodes bayésiennes d'intégration temporelle. Dans un premier modèle, nous ne tenons pas compte de l'ordre des images. Le mécanisme d'intégration est donc particulièrement simple mais montre des difficultés à repérer les changements de lieux. Nous élaborons donc plusieurs mécanismes de détection des transitions entre lieux qui ne nécessitent pas d'apprentissage supplémentaire. Une deuxième version enrichit le formalisme classique du filtrage bayésien en utilisant l'ordre local d'apparition des images. Nous comparons nos méthodes à l'état de l'art sur des tâches de reconnaissance d'instances et de catégorisation, en utilisant plusieurs bases de données. Nous étudions l'influence des paramètres sur les performances et comparons les différents types de codage employés sur une même base.Ces expériences montrent que nos méthodes sont supérieures à l'état de l'art, en particulier sur les tâches de catégorisation. / Human beings naturally organize their space as composed of discrete units. Those units, called "semantic places", are characterized by their spatial extend and their functional unity. Moreover, we are able to quickly recognize a given place (e.g. office 205) and its category (i.e. an office), solely on their visual appearance. Recent works in semantic place recognition seek to endow the robot with similar capabilities. Contrary to classical localization and mapping work, this problem is usually tackled as a supervised learning problem. Our contributions are two fold. First, we combine global image characterization, which captures the global organization of the image, and visual words methods which are usually based unsupervised classification of local signatures. Our second but closely related, contribution is to use several images for recognition by using Bayesian methods for temporal integration. Our first model don't use the natural temporal ordering of images. Temporal integration is very simple but has difficulties when the robot moves from one place to another.We thus develop several mechanisms to detect place transitions. Those mechanisms are simple and don't require additional learning. A second model augment the classical Bayesian filtering approach by using the local order among images. We compare our methods to state-of-the-art algorithms on place recognition and place categorization tasks.We study the influence of system parameters and compare the different global characterization methods on the same dataset. These experiments show that our approach while being simple leads to better results especially on the place categorization task.
|
12 |
Caracterização e recuperação de imagens usando dicionários visuais semanticamente enriquecidos / Image characterization and retrieval using visual dictionaries semantically enrichedPedrosa, Glauco Vitor 24 August 2015 (has links)
A análise automática da similaridade entre imagens depende fortemente de descritores que consigam caracterizar o conteúdo das imagens em dados compactos e discriminativos. Esses dados extraídos e representados em um vetor-de-características tem o objetivo de representar as imagens nos processos de mineração e análise para classificação e/ou recuperação. Neste trabalho foi explorado o uso de dicionários visuais e contexto para representar e recuperar as características locais das imagens utilizando formalismos estendidos com alto poder descritivo. Esta tese apresenta em destaque três novas propostas que contribuem competitivamente com outros trabalhos da literatura no avanço do estado-da-arte, desenvolvendo novas metodologias para a caracterização de imagens e para o processamento de consultas por similaridade. A primeira proposta estende a modelagem Bag-of-Visual-Words, permitindo codificar a interação entre palavras-visuais e suas disposições espaciais na imagem. Para tal fim, três novas abordagem são apresentadas: (i) Weighted Histogram (WE); (ii) Bunch-of-2-grams e (iii) Global Spatial Arrangement (GSA). Cada uma dessas técnicas permitem extrair informações semanticamente complementares, que enriquecem a representação final das imagens descritas em palavras-visuais. A segunda proposta apresenta um novo descritor, chamado de Bag-of-Salience-Points (BoSP), que caracteriza e analisa a dissimilaridade de formas (silhuetas) de objetos explorando seus pontos de saliências. O descritor BoSP se apoia no uso de um dicionário de curvaturas e em histogramas espaciais para representar sucintamente as saliências de um objeto em um único vetor-de-características de tamanho fixo, permitindo recuperar formas usando funções de distâncias computacionalmente rápidas. Por fim, a terceira proposta apresenta um novo modelo de consulta por similaridade, denominada Similarity Based on Dominant Images (SimDIm), baseada no conceito de Imagens Dominantes, que é um conjunto que representa, de uma maneira mais diversificada e reduzida, toda a coleção de imagens da base de dados. Tal conceito permite dar mais eficiência quando se deseja analisar o contexto da coleção, que é o objetivo da proposta. Os experimentos realizados mostram que os métodos propostos contribuem de maneira efetiva para caracterizar e quantificar a similaridade entre imagens por meio de abordagens estendidas baseadas em dicionários visuais e análise contextual, reduzindo a lacuna semântica existente entre a percepção humana e a descrição computacional. / The automatic similarity analysis between images depends heavily on the use of descriptors that should be able to characterize the images\' content in compact and discriminative features. These extracted features are represented by a feature-vector employed to represent the images in the process of mining and analysis for classification and/or retrieval. This work investigated the use of visual dictionaries and context to represent and retrieve the local image features using extended formalism with high descriptive power. This thesis presents three new proposals that contribute in advancing the state-of-the-art by developing new methodologies for characterizing images and for processing similarity queries by content. The first proposal extends the Bag-of-Visual-Words model, by encoding the interaction between the visual words and their spatial arrangements in the image space. For this, three new techniques are presented: (i) Weighted Histogram (WE); (ii) Bunch-of--grams and (iii) Global Spatial Arrangement (GSA). These three techniques allow to extract additional semantically information that enrich the final image representation described in visual-words. The second proposal introduces a new descriptor, called Bag-of-Salience-Points (BoSP), which characterizes and analyzes the dissimilarity of shapes (silhouettes) exploring their salient point. The BoSP descriptor is based on using a dictionary of curvatures and spatial-histograms to represent succinctly the saliences of a shape into a single fixed-length feature-vector, allowing to retrieve shapes using distance functions computationally fast. Finally, the third proposal introduces a new similarity query model, called Similarity based on Dominant Images (SimDIm), based on the concept of dominant images, which is a set of images representing the entire collection of images of the database in a more diversified and reduced manner. This concept allows to efficiently analyze the context of the entire collection, which is the final goal. The experiments showed that the proposed methods effectively contributed to characterize and quantify the similarity between images using extended approaches based on visual dictionaries and contextual analysis, reducing the semantic gap between human perception and computational description.
|
13 |
A bag of features approach for human attribute analysis on face images / Uma abordagem \"bag of features\" para análise de atributos humanos em imagens de facesAraujo, Rafael Will Macêdo de 06 September 2019 (has links)
Computer Vision researchers are constantly challenged with questions that are motivated by real applications. One of these questions is whether a computer program could distinguish groups of people based on their geographical ancestry, using only frontal images of their faces. The advances in this research area in the last ten years show that the answer to that question is affirmative. Several papers address this problem by applying methods such as Local Binary Patterns (LBP), raw pixel values, Principal or Independent Component Analysis (PCA/ICA), Gabor filters, Biologically Inspired Features (BIF), and more recently, Convolution Neural Networks (CNN). In this work we propose to combine the Bag-of-Visual-Words model with new dictionary learning techniques and a new spatial structure approach for image features. An extensive set of experiments has been performed using two of the largest face image databases available (MORPH-II and FERET), reaching very competitive results for gender and ethnicity recognition, while using a considerable small set of images for training. / Pesquisadores de visão computacional são constantemente desafiados com perguntas motivadas por aplicações reais. Uma dessas questões é se um programa de computador poderia distinguir grupos de pessoas com base em sua ascendência geográfica, usando apenas imagens frontais de seus rostos. Os avanços nesta área de pesquisa nos últimos dez anos mostram que a resposta a essa pergunta é afirmativa. Vários artigos abordam esse problema aplicando métodos como Padrões Binários Locais (LBP), valores de pixels brutos, Análise de Componentes Principais ou Independentes (PCA/ICA), filtros de Gabor, Características Biologicamente Inspiradas (BIF) e, mais recentemente, Redes Neurais Convolucionais (CNN). Neste trabalho propomos combinar o modelo \"bag-of-words\" visual com novas técnicas de aprendizagem por dicionário e uma nova abordagem de estrutura espacial para características da imagem. Um extenso conjunto de experimentos foi realizado usando dois dos maiores bancos de dados de imagens faciais disponíveis (MORPH-II e FERET), alcançando resultados muito competitivos para reconhecimento de gênero e etnia, ao passo que utiliza um conjunto consideravelmente pequeno de imagens para treinamento.
|
14 |
Reconnaissance perceptuelle des objets d’Intérêt : application à l’interprétation des activités instrumentales de la vie quotidienne pour les études de démence / Perceptual object of interest recognition : application to the interpretation of instrumental activities of daily living for dementia studiesBuso, Vincent 30 November 2015 (has links)
Cette thèse est motivée par le diagnostic, l’évaluation, la maintenance et la promotion de l’indépendance des personnes souffrant de maladies démentielles pour leurs activités de la vie quotidienne. Dans ce contexte nous nous intéressons à la reconnaissance automatique des activités de la vie quotidienne.L’analyse des vidéos de type égocentriques (où la caméra est posée sur une personne) a récemment gagné beaucoup d’intérêt en faveur de cette tâche. En effet de récentes études démontrent l’importance cruciale de la reconnaissance des objets actifs (manipulés ou observés par le patient) pour la reconnaissance d’activités et les vidéos égocentriques présentent l’avantage d’avoir une forte différenciation entre les objets actifs et passifs (associés à l’arrière plan). Une des approches récentes envers la reconnaissance des éléments actifs dans une scène est l’incorporation de la saillance visuelle dans les algorithmes de reconnaissance d’objets. Modéliser le processus sélectif du système visuel humain représente un moyen efficace de focaliser l’analyse d’une scène vers les endroits considérés d’intérêts ou saillants,qui, dans les vidéos égocentriques, correspondent fortement aux emplacements des objets d’intérêt. L’objectif de cette thèse est de permettre au systèmes de reconnaissance d’objets de fournir une détection plus précise des objets d’intérêts grâce à la saillance visuelle afin d’améliorer les performances de reconnaissances d’activités de la vie de tous les jours. Cette thèse est menée dans le cadre du projet Européen Dem@care.Concernant le vaste domaine de la modélisation de la saillance visuelle, nous étudions et proposons une contribution à la fois dans le domaine "Bottom-up" (regard attiré par des stimuli) que dans le domaine "Top-down" (regard attiré par la sémantique) qui ont pour but d’améliorer la reconnaissance d’objets actifs dans les vidéos égocentriques. Notre première contribution pour les modèles Bottom-up prend racine du fait que les observateurs d’une vidéo sont normalement attirés par le centre de celle-ci. Ce phénomène biologique s’appelle le biais central. Dans les vidéos égocentriques cependant, cette hypothèse n’est plus valable.Nous proposons et étudions des modèles de saillance basés sur ce phénomène de biais non central.Les modèles proposés sont entrainés à partir de fixations d’oeil enregistrées et incorporées dans des modèles spatio-temporels. Lorsque comparés à l’état-de-l’art des modèles Bottom-up, ceux que nous présentons montrent des résultats prometteurs qui illustrent la nécessité d’un modèle géométrique biaisé non-centré dans ce type de vidéos. Pour notre contribution dans le domaine Top-down, nous présentons un modèle probabiliste d’attention visuelle pour la reconnaissance d’objets manipulés dans les vidéos égocentriques. Bien que les bras soient souvent source d’occlusion des objets et considérés comme un fardeau, ils deviennent un atout dans notre approche. En effet nous extrayons à la fois des caractéristiques globales et locales permettant d’estimer leur disposition géométrique. Nous intégrons cette information dans un modèle probabiliste, avec équations de mise a jour pour optimiser la vraisemblance du modèle en fonction de ses paramètres et enfin générons les cartes d’attention visuelle pour la reconnaissance d’objets manipulés. [...] / The rationale and motivation of this PhD thesis is in the diagnosis, assessment,maintenance and promotion of self-independence of people with dementia in their InstrumentalActivities of Daily Living (IADLs). In this context a strong focus is held towardsthe task of automatically recognizing IADLs. Egocentric video analysis (cameras worn by aperson) has recently gained much interest regarding this goal. Indeed recent studies havedemonstrated how crucial is the recognition of active objects (manipulated or observedby the person wearing the camera) for the activity recognition task and egocentric videospresent the advantage of holding a strong differentiation between active and passive objects(associated to background). One recent approach towards finding active elements in a sceneis the incorporation of visual saliency in the object recognition paradigms. Modeling theselective process of human perception of visual scenes represents an efficient way to drivethe scene analysis towards particular areas considered of interest or salient, which, in egocentricvideos, strongly corresponds to the locus of objects of interest. The objective of thisthesis is to design an object recognition system that relies on visual saliency-maps to providemore precise object representations, that are robust against background clutter and, therefore,improve the recognition of active object for the IADLs recognition task. This PhD thesisis conducted in the framework of the Dem@care European project.Regarding the vast field of visual saliency modeling, we investigate and propose a contributionin both Bottom-up (gaze driven by stimuli) and Top-down (gaze driven by semantics)areas that aim at enhancing the particular task of active object recognition in egocentricvideo content. Our first contribution on Bottom-up models originates from the fact thatobservers are attracted by a central stimulus (the center of an image). This biological phenomenonis known as central bias. In egocentric videos however this hypothesis does not alwayshold. We study saliency models with non-central bias geometrical cues. The proposedvisual saliency models are trained based on eye fixations of observers and incorporated intospatio-temporal saliency models. When compared to state of the art visual saliency models,the ones we present show promising results as they highlight the necessity of a non-centeredgeometric saliency cue. For our top-down model contribution we present a probabilisticvisual attention model for manipulated object recognition in egocentric video content. Althougharms often occlude objects and are usually considered as a burden for many visionsystems, they become an asset in our approach, as we extract both global and local featuresdescribing their geometric layout and pose, as well as the objects being manipulated. We integratethis information in a probabilistic generative model, provide update equations thatautomatically compute the model parameters optimizing the likelihood of the data, and designa method to generate maps of visual attention that are later used in an object-recognitionframework. This task-driven assessment reveals that the proposed method outperforms thestate-of-the-art in object recognition for egocentric video content. [...]
|
15 |
Caracterização e recuperação de imagens usando dicionários visuais semanticamente enriquecidos / Image characterization and retrieval using visual dictionaries semantically enrichedGlauco Vitor Pedrosa 24 August 2015 (has links)
A análise automática da similaridade entre imagens depende fortemente de descritores que consigam caracterizar o conteúdo das imagens em dados compactos e discriminativos. Esses dados extraídos e representados em um vetor-de-características tem o objetivo de representar as imagens nos processos de mineração e análise para classificação e/ou recuperação. Neste trabalho foi explorado o uso de dicionários visuais e contexto para representar e recuperar as características locais das imagens utilizando formalismos estendidos com alto poder descritivo. Esta tese apresenta em destaque três novas propostas que contribuem competitivamente com outros trabalhos da literatura no avanço do estado-da-arte, desenvolvendo novas metodologias para a caracterização de imagens e para o processamento de consultas por similaridade. A primeira proposta estende a modelagem Bag-of-Visual-Words, permitindo codificar a interação entre palavras-visuais e suas disposições espaciais na imagem. Para tal fim, três novas abordagem são apresentadas: (i) Weighted Histogram (WE); (ii) Bunch-of-2-grams e (iii) Global Spatial Arrangement (GSA). Cada uma dessas técnicas permitem extrair informações semanticamente complementares, que enriquecem a representação final das imagens descritas em palavras-visuais. A segunda proposta apresenta um novo descritor, chamado de Bag-of-Salience-Points (BoSP), que caracteriza e analisa a dissimilaridade de formas (silhuetas) de objetos explorando seus pontos de saliências. O descritor BoSP se apoia no uso de um dicionário de curvaturas e em histogramas espaciais para representar sucintamente as saliências de um objeto em um único vetor-de-características de tamanho fixo, permitindo recuperar formas usando funções de distâncias computacionalmente rápidas. Por fim, a terceira proposta apresenta um novo modelo de consulta por similaridade, denominada Similarity Based on Dominant Images (SimDIm), baseada no conceito de Imagens Dominantes, que é um conjunto que representa, de uma maneira mais diversificada e reduzida, toda a coleção de imagens da base de dados. Tal conceito permite dar mais eficiência quando se deseja analisar o contexto da coleção, que é o objetivo da proposta. Os experimentos realizados mostram que os métodos propostos contribuem de maneira efetiva para caracterizar e quantificar a similaridade entre imagens por meio de abordagens estendidas baseadas em dicionários visuais e análise contextual, reduzindo a lacuna semântica existente entre a percepção humana e a descrição computacional. / The automatic similarity analysis between images depends heavily on the use of descriptors that should be able to characterize the images\' content in compact and discriminative features. These extracted features are represented by a feature-vector employed to represent the images in the process of mining and analysis for classification and/or retrieval. This work investigated the use of visual dictionaries and context to represent and retrieve the local image features using extended formalism with high descriptive power. This thesis presents three new proposals that contribute in advancing the state-of-the-art by developing new methodologies for characterizing images and for processing similarity queries by content. The first proposal extends the Bag-of-Visual-Words model, by encoding the interaction between the visual words and their spatial arrangements in the image space. For this, three new techniques are presented: (i) Weighted Histogram (WE); (ii) Bunch-of--grams and (iii) Global Spatial Arrangement (GSA). These three techniques allow to extract additional semantically information that enrich the final image representation described in visual-words. The second proposal introduces a new descriptor, called Bag-of-Salience-Points (BoSP), which characterizes and analyzes the dissimilarity of shapes (silhouettes) exploring their salient point. The BoSP descriptor is based on using a dictionary of curvatures and spatial-histograms to represent succinctly the saliences of a shape into a single fixed-length feature-vector, allowing to retrieve shapes using distance functions computationally fast. Finally, the third proposal introduces a new similarity query model, called Similarity based on Dominant Images (SimDIm), based on the concept of dominant images, which is a set of images representing the entire collection of images of the database in a more diversified and reduced manner. This concept allows to efficiently analyze the context of the entire collection, which is the final goal. The experiments showed that the proposed methods effectively contributed to characterize and quantify the similarity between images using extended approaches based on visual dictionaries and contextual analysis, reducing the semantic gap between human perception and computational description.
|
16 |
Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classificationAlqasrawi, Yousef T. N., Neagu, Daniel, Cowling, Peter I. January 2013 (has links)
No / The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.
|
17 |
Descripteurs augmentés basés sur l'information sémantique contextuelle / Toward semantic-shape-context-based augmented descriptorKhoualed, Samir 29 November 2012 (has links)
Les techniques de description des éléments caractéristiques d’une image sont omniprésentes dans de nombreuses applications de vision par ordinateur. Nous proposons à travers ce manuscrit une extension, pour décrire (représenter) et apparier les éléments caractéristiques des images. L’extension proposée consiste en une approche originale pour apprendre, ou estimer, la présence sémantique des éléments caractéristiques locaux dans les images. L’information sémantique obtenue est ensuite exploitée, en conjonction avec le paradigme de sac-de-mots, pour construire un descripteur d’image performant. Le descripteur résultant, est la combinaison de deux types d’informations, locale et contextuelle-sémantique. L’approche proposée peut être généralisée et adaptée à n’importe quel descripteur local d’image, pour améliorer fortement ses performances spécialement quand l’image est soumise à des conditions d’imagerie contraintes. La performance de l’approche proposée est évaluée avec des images réelles aussi bien dans les deux domaines, 2D que 3D. Nous avons abordé dans le domaine 2D, un problème lié à l’appariement des éléments caractéristiques dans des images. Dans le domaine 3D, nous avons résolu les problèmes d’appariement et alignement des vues partielles tridimensionnelles. Les résultats obtenus ont montré qu’avec notre approche, les performances sont nettement meilleures par rapport aux autres méthodes existantes. / This manuscript presents an extension of feature description and matching strategies by proposing an original approach to learn the semantic information of local features. This semantic is then exploited, in conjunction with the bag-of-words paradigm, to build a powerful feature descriptor. The approach, ended up by combining local and context information into a single descriptor, is also a generalized method for improving the performance of the local features, in terms of distinctiveness and robustness under geometric image transformations and imaging conditions. The performance of the proposed approach is evaluated on real world data sets as well as in both the 2D and 3D domains. The 2D domain application addresses the problem of image feature matching while in 3D domain, we resolve the issue of matching and alignment of multiple range images. The evaluation results showed our approach performs significantly better than expected results as well as in comparison with other methods.
|
18 |
Discriminative image representations using spatial and color information for category-level classification / Représentations discriminantes d'image intégrant information spatiale et couleur pour la classification d'imagesKhan, Rahat 08 October 2013 (has links)
La représentation d'image est au cœur de beaucoup d'algorithmes de vision par ordinateur. Elle intervient notamment dans des tâches de reconnaissance de catégories visuelles comme la classification ou la détection d'objets. Dans ce contexte, la représentation "sac de mot visuel" (Bag of Visual Words ou BoVW en anglais) est l'une des méthodes de référence. Dans cette thèse, nous nous appuyons sur ce modèle pour proposer des représentations d'images discriminantes. Dans la première partie, nous présentons une nouvelle approche simple et efficace pour prendre en compte des informations spatiales dans le modèle BoVW. Son principe est de considérer l'orientation et la longueur de segments formés par des paires de descripteurs similaires. Une notion de "softsimilarité" est introduite pour définir ces relations intra et inter mots visuels. Nous montrons expérimentalement que notre méthode ajoute une information discriminante importante au modèle BoVW et que cette information est complémentaire aux méthodes de l'état de l'art. Ensuite, nous nous focalisons sur la description de l'information couleur. Contrairement aux approches traditionnelles qui s'appuient sur des descriptions invariantes aux changements d'éclairage, nous proposons un descripteur basé sur le pouvoir discriminant. Nos expérimentations permettent de conclure que ce descripteur apprend automatiquement un certain degré d'invariance photométrique tout en surclassant les descripteurs basés sur cette invariance photométrique. De plus, combiné avec un descripteur de forme, le descripteur proposé donne des résultats excellents sur quatre jeux de données particulièrement difficiles. Enfin, nous nous intéressons à la représentation de la couleur à partir de la réflectance multispectrale des surfaces observées, information difficile à extraire sans instruments sophistiqués. Ainsi, nous proposons d'utiliser l'écran et la caméra d'un appareil portable pour capturer des images éclairées par les couleurs primaires de l'écran. Trois éclairages et trois réponses de caméra produisent neuf valeurs pour estimer la réflectance. Les résultats montrent que la précision de la reconstruction spectrale est meilleure que celle estimée avec un seul éclairage. Nous concluons que ce type d'acquisition est possible avec des appareils grand public tels que les tablettes, téléphones ou ordinateurs portables / Image representation is in the heart of many computer vision algorithms. Different computer vision tasks (e.g. classification, detection) require discriminative image representations to recognize visual categories. In a nutshell, the bag-of-visual-words image representation is the most successful approach for object and scene recognition. In this thesis, we mainly revolve around this model and search for discriminative image representations. In the first part, we present a novel approach to incorporate spatial information in the BoVW method. In this framework, we present a simple and efficient way to infuse spatial information by taking advantage of the orientation and length of the segments formed by pairs of similar descriptors. We introduce the notion of soft-similarity to compute intra and inter visual word spatial relationships. We show experimentally that, our method adds important discriminative information to the BoVW method and complementary to the state-of-the-art method. Next, we focus on color description in general. Differing from traditional approaches of invariant description to account for photometric changes, we propose discriminative color descriptor. We demonstrate that such a color description automatically learns a certain degree of photometric invariance. Experiments show that the proposed descriptor outperforms existing photometric invariants. Furthermore, we show that combined with shape descriptor, the proposed color descriptor obtain excellent results on four challenging data sets.Finally, we focus on the most accurate color representation i.e. multispectral reflectance which is an intrinsic property of a surface. Even with the modern era technological advancement, it is difficult to extract reflectance information without sophisticated instruments. To this end, we propose to use the display of the device as an illuminant while the camera captures images illuminated by the red, green and blue primaries of the display. Three illuminants and three response functions of the camera lead to nine response values which are used for reflectance estimation. Results show that the accuracy of the spectral reconstruction improves significantly over the spectral reconstruction based on a single illuminant. We conclude that, multispectral data acquisition is potentially possible with consumer hand-held devices such as tablets, mobiles, and laptops
|
19 |
Sparse representations over learned dictionary for document analysis / Présentations parcimonieuses sur dictionnaire d'apprentissage pour l'analyse de documentsDo, Thanh Ha 04 April 2014 (has links)
Dans cette thèse, nous nous concentrons sur comment les représentations parcimonieuses peuvent aider à augmenter les performances pour réduire le bruit, extraire des régions de texte, reconnaissance des formes et localiser des symboles dans des documents graphiques. Pour ce faire, tout d'abord, nous donnons une synthèse des représentations parcimonieuses et ses applications en traitement d'images. Ensuite, nous présentons notre motivation pour l'utilisation de dictionnaires d'apprentissage avec des algorithmes efficaces pour les construire. Après avoir décrit l'idée générale des représentations parcimonieuses et du dictionnaire d'apprentissage, nous présentons nos contributions dans le domaine de la reconnaissance de symboles et du traitement des documents en les comparants aux travaux de l'état de l'art. Ces contributions s'emploient à répondre aux questions suivantes: La première question est comment nous pouvons supprimer le bruit des images où il n'existe aucune hypothèse sur le modèle de bruit sous-jacent à ces images ? La deuxième question est comment les représentations parcimonieuses sur le dictionnaire d'apprentissage peuvent être adaptées pour séparer le texte du graphique dans des documents? La troisième question est comment nous pouvons appliquer la représentation parcimonieuse à reconnaissance de symboles? Nous complétons cette thèse en proposant une approche de localisation de symboles dans les documents graphiques qui utilise les représentations parcimonieuses pour coder un vocabulaire visuel / In this thesis, we focus on how sparse representations can help to increase the performance of noise removal, text region extraction, pattern recognition and spotting symbols in graphical documents. To do that, first of all, we give a survey of sparse representations and its applications in image processing. Then, we present the motivation of building learning dictionary and efficient algorithms for constructing a learning dictionary. After describing the general idea of sparse representations and learned dictionary, we bring some contributions in the field of symbol recognition and document processing that achieve better performances compared to the state-of-the-art. These contributions begin by finding the answers to the following questions. The first question is how we can remove the noise of a document when we have no assumptions about the model of noise found in these images? The second question is how sparse representations over learned dictionary can separate the text/graphic parts in the graphical document? The third question is how we can apply the sparse representation for symbol recognition? We complete this thesis by proposing an approach of spotting symbols that use sparse representations for the coding of a visual vocabulary
|
20 |
Indexation bio-inspirée pour la recherche d'images par similarité / Bio-inspired Indexing for Content-Based Image RetrievalMichaud, Dorian 16 October 2018 (has links)
La recherche d'images basée sur le contenu visuel est un domaine très actif de la vision par ordinateur, car le nombre de bases d'images disponibles ne cesse d'augmenter.L’objectif de ce type d’approche est de retourner les images les plus proches d'une requête donnée en terme de contenu visuel.Notre travail s'inscrit dans un contexte applicatif spécifique qui consiste à indexer des petites bases d'images expertes sur lesquelles nous n'avons aucune connaissance a priori.L’une de nos contributions pour palier ce problème consiste à choisir un ensemble de descripteurs visuels et de les placer en compétition directe. Nous utilisons deux stratégies pour combiner ces caractéristiques : la première, est pyschovisuelle, et la seconde, est statistique.Dans ce contexte, nous proposons une approche adaptative non supervisée, basée sur les sacs de mots et phrases visuels, dont le principe est de sélectionner les caractéristiques pertinentes pour chaque point d'intérêt dans le but de renforcer la représentation de l'image.Les tests effectués montrent l'intérêt d'utiliser ce type de méthodes malgré la domination des méthodes basées réseaux de neurones convolutifs dans la littérature.Nous proposons également une étude, ainsi que les résultats de nos premiers tests concernant le renforcement de la recherche en utilisant des méthodes semi-interactives basées sur l’expertise de l'utilisateur. / Image Retrieval is still a very active field of image processing as the number of available image datasets continuously increases.One of the principal objectives of Content-Based Image Retrieval (CBIR) is to return the most similar images to a given query with respect to their visual content.Our work fits in a very specific application context: indexing small expert image datasets, with no prior knowledge on the images. Because of the image complexity, one of our contributions is the choice of effective descriptors from literature placed in direct competition.Two strategies are used to combine features: a psycho-visual one and a statistical one.In this context, we propose an unsupervised and adaptive framework based on the well-known bags of visual words and phrases models that select relevant visual descriptors for each keypoint to construct a more discriminative image representation.Experiments show the interest of using this this type of methodologies during a time when convolutional neural networks are ubiquitous.We also propose a study about semi interactive retrieval to improve the accuracy of CBIR systems by using the knowledge of the expert users.
|
Page generated in 0.0383 seconds