Global ETD Search

11	Real-time Hand Gesture Detection and Recognition for Human Computer Interaction Dardas, Nasser Hasan Abdel-Qader January 2012 (has links) This thesis focuses on bare hand gesture recognition by proposing a new architecture to solve the problem of real-time vision-based hand detection, tracking, and gesture recognition for interaction with an application via hand gestures. The first stage of our system allows detecting and tracking a bare hand in a cluttered background using face subtraction, skin detection and contour comparison. The second stage allows recognizing hand gestures using bag-of-features and multi-class Support Vector Machine (SVM) algorithms. Finally, a grammar has been developed to generate gesture commands for application control. Our hand gesture recognition system consists of two steps: offline training and online testing. In the training stage, after extracting the keypoints for every training image using the Scale Invariance Feature Transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multi-class SVM to build the classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using my algorithm. Then, the keypoints are extracted for every small image that contains the detected hand posture and fed into the cluster model to map them into a bag-of-words vector, which is fed into the multi-class SVM classifier to recognize the hand gesture. Another hand gesture recognition system was proposed using Principle Components Analysis (PCA). The most eigenvectors and weights of training images are determined. In the testing stage, the hand posture is detected for every frame using my algorithm. Then, the small image that contains the detected hand is projected onto the most eigenvectors of training images to form its test weights. Finally, the minimum Euclidean distance is determined among the test weights and the training weights of each training image to recognize the hand gesture. Two application of gesture-based interaction with a 3D gaming virtual environment were implemented. The exertion videogame makes use of a stationary bicycle as one of the main inputs for game playing. The user can control and direct left-right movement and shooting actions in the game by a set of hand gesture commands, while in the second game, the user can control and direct a helicopter over the city by a set of hand gesture commands. Posture recognition Gesture recognition Scale Invariant Feature Transform (SIFT) K-means Bag-of-features Support Vector Machine (SVM) Human-computer interaction
12	Partial 3D-shape indexing and retrieval El Khoury, Rachid 22 March 2013 (has links) (PDF) A growing number of 3D graphic applications have an impact on today's society. These applications are being used in several domains ranging from digital entertainment, computer aided design, to medical applications. In this context, a 3D object search engine with a good performance in time consuming and results becomes mandatory. We propose a novel approach for 3D-model retrieval based on closed curves. Then we enhance our method to handle partial 3D-model retrieval. Our method starts by the definition of an invariant mapping function. The important properties of a mapping function are its invariance to rigid and non rigid transformations, the correct description of the 3D-model, its insensitivity to noise, its robustness to topology changes, and its independance on parameters. However, current state-of-the-art methods do not respect all these properties. To respect these properties, we define our mapping function based on the diffusion and the commute-time distances. To prove the properties of this function, we compute the Reeb graph of the 3D-models. To describe the whole 3D-model, using our mapping function, we generate indexed closed curves from a source point detected automatically at the center of a 3D-model. Each curve describes a small region of the 3D-model. These curves lead to create an invariant descriptor to different transformations. To show the robustness of our method on various classes of 3D-models with different poses, we use shapes from SHREC 2012. We also compare our approach to existing methods in the state-of-the-art with a dataset from SHREC 2010. For partial 3D-model retrieval, we enhance the proposed method using the Bag-Of-Features built with all the extracted closed curves, and show the accurate performances using the same dataset 3D-models Heat kernel Diffusion distance Commute time distance Reeb graphs Retrieval Partial retrieval Bag-of-features
13	Graphics Recognition using Spatial Relations and Shape Analysis / Reconnaissance de Graphiques en utilisant les Relations Spatiales et Analyse de la Forme K. C., Santosh 28 November 2011 (has links) Dans l’état de l’art actuel, la reconnaissance de symboles signifie généralement la reconnaissance des symboles isolés. Cependant, ces méthodes de reconnaissance de symboles isolés ne sont pas toujours adaptés pour résoudre les problèmes du monde réel. Dans le cas des documents composites qui contiennent des éléments textuels et graphiques, on doit être capable d’extraire et de formaliser les liens qui existent entre les images et le texte environnant, afin d’exploiter les informations incorporées dans ces documents.Liés à ce contexte, nous avons d’abord introduit une méthode de reconnaissance graphique basée sur la programmation dynamique et la mise en correspondance de caractéristiques issues de la transformée de Radon. Cette méthode permet d’exploiter la propriété de cette transformée pour inclure à la fois le contour et la structure interne des formes sans utiliser de techniques de compression de la représentation du motif dans un seul vecteur et qui pourrait passer à côté d’informations importantes. La méthode surpasse en performances les descripteurs de forme de l’état de l’art, mais reste principalement adapté pour la reconnaissance de symboles isolés seulement. Nous l’avons donc intégrée dans une approche complètement nouvelle pour la reconnaissance de symboles basé sur la description spatio-structurelle d’un «vocabulaire» de primitives visuelles extraites. La méthode est basée sur les relations spatiales entre des paires de types étiquetés de ce vocabulaire (dont certains peuvent être caractérisés avec le descripteur mentionné précédemment), qui sont ensuite utilisées comme base pour construire un graphe relationnel attribué (ARG) qui décrit des symboles. Grâce à notre étiquetage des types d’attribut, nous évitons le problème classique NP-difficile d’appariement de graphes. Nous effectuons une comparaison exhaustive avec d’autres modèles de relations spatiales ainsi qu’avec l’état de l’art des approches pour la reconnaissance des graphismes afin de prouver que notre approche combine efficacement les descripteurs statistiques structurels et globaux et les surpasse de manière significative.Dans la dernière partie de cette thèse, nous présentons une approche de type sac de caractéristiques utilisant les relations spatiales, où chaque paire possible primitives visuelles est indexée par sa configuration topologique et les types visuels de ses composants. Ceci fournit un moyen de récupérer les symboles isolés ainsi que d’importantes parties connues de symboles en appliquant soit un symbole isolée comme une requête soit une collection de relations entre les primitives visuelles. Finalement, ceci ouvre des perspectives vers des processus de reconnaissance de symboles fondés sur le langage naturel / In the current state-of-the-art, symbol recognition usually means recognising isolated symbols. However, isolated symbol recognition methods are not always suitable for solving real-world problems. In case of composite documents that contain textual and graphical elements, one needs to be able to extract and formalise the links that exist between the images and the surrounding text, in order to exploit the information embedded in those documents.Related to this context, we first introduce a method for graphics recognition based on dynamic programming matching of the Radon features. This method allows to exploit the Radon Transform property to include both boundary and internal structure of shapes without compressing the pattern representation into a single vector that may miss information. The method outperforms all major set of state-of-the-art of shape descriptors but remains mainly suited for isolated symbol recognition only. We therefore integrate it in a completely new approach for symbol recognition based on the spatio-structural description of a ‘vocabulary’ of extracted visual primitives. The method is based on spatial relations between pairs of labelled vocabulary types (some of which can be characterised with the previously mentioned descriptor), which are further used as a basis for building an attributed relational graph (ARG) to describe symbols. Thanks to our labelling of attribute types, we avoid the general NP-hard graph matching problem. We provide a comprehensive comparison with other spatial relation models as well as state-of-the-art approaches for graphics recognition and prove that our approach effectively combines structural and statistical descriptors together and outperforms them significantly.In the final part of this thesis, we present a Bag-Of-Features (BOFs) approach using spatial relations where every possible pair of individual visual primitives is indexed by its topological configuration and the visual type of its components. This provides a way to retrieve isolated symbols as well as significant known parts of symbols by applying either an isolated symbol as a query or a collection of relations between the important visual primitives. Eventually, it opens perspectives towards natural language based symbol recognition process Descripteur de Radon Programmation dynamique Descripteurs de forme Vocabulaire visuel Relations spaciales Sac de caractéristiques spaciales Reconnaissance graphique Radon Features Dynamic Programming Shape Descriptors Visual Vacabulary Spatial Relations Spatial-Bag-of-Features Graphics Recognition 005
14	Recherche multi-descripteurs dans les fonds photographiques numérisés / Multi-descriptor retrieval in digitalized photographs collections Bhowmik, Neelanjan 07 November 2017 (has links) La recherche d’images par contenu (CBIR) est une discipline de l’informatique qui vise à structurer automatiquement les collections d’images selon des critères visuels. Les fonctionnalités proposées couvrent notamment l’accès efficace aux images dans une grande base de données d’images ou l’identification de leur contenu par des outils de détection et de reconnaissance d’objets. Ils ont un impact sur une large gamme de domaines qui manipulent ce genre de données, telles que le multimedia, la culture, la sécurité, la santé, la recherche scientifique, etc.Indexer une image à partir de son contenu visuel nécessite d’abord de produire un résumé visuel de ce contenu pour un usage donné, qui sera l’index de cette image dans la collection. En matière de descripteurs d’images, la littérature est désormais trés riche: plusieurs familles de descripteurs existent, et dans chaque famille de nombreuses approches cohabitent. Bon nombre de descripteurs ne décrivant pas la même information et n’ayant pas les mêmes propriétés d’invariance, il peut être pertinent de les combiner de manière à mieux décrire le contenu de l’image. Cette combinaison peut être mise en oeuvre de différentes manières, selon les descripteurs considérés et le but recherché. Dans cette thése, nous nous concentrons sur la famille des descripteurs locaux, avec pour application la recherche d’images ou d’objets par l’exemple dans une collection d’images. Leurs bonnes propriétés les rendent très populaires pour la recherche, la reconnaissance et la catégorisation d'objets et de scènes. Deux directions de recherche sont étudiées:Combinaison de caractéristiques pour la recherche d’images par l’exemple: Le coeur de la thèse repose sur la proposition d’un modèle pour combiner des descripteurs de bas niveau et génériques afin d’obtenir un descripteur plus riche et adapté à un cas d’utilisation donné tout en conservant la généricité afin d’indexer différents types de contenus visuels. L’application considérée étant la recherche par l’exemple, une autre difficulté majeure est la complexité de la proposition, qui doit correspondre à des temps de récupération réduits, même avec de grands ensembles de données. Pour atteindre ces objectifs, nous proposons une approche basée sur la fusion d'index inversés, ce qui permet de mieux représenter le contenu tout en étant associé à une méthode d’accès efficace.Complémentarité des descripteurs: Nous nous concentrons sur l’évaluation de la complémentarité des descripteurs locaux existant en proposant des critères statistiques d’analyse de leur répartition spatiale dans l'image. Ce travail permet de mettre en évidence une synergie entre certaines de ces techniques lorsqu’elles sont jugées suffisamment complémentaires. Les critères spatiaux sont exploités dans un modèle de prédiction à base de régression linéaire, qui a l'avantage de permettre la sélection de combinaisons de descripteurs optimale pour la base considérée mais surtout pour chaque image de cette base. L'approche est évaluée avec le moteur de recherche multi-index, où il montre sa pertinence et met aussi en lumière le fait que la combinaison optimale de descripteurs peut varier d'une image à l'autre.En outre, nous exploitons les deux propositions précédentes pour traiter le problème de la recherche d'images inter-domaines, correspondant notamment à des vues multi-source et multi-date. Deux applications sont explorées dans cette thèse. La recherche d’images inter-domaines est appliquée aux collections photographiques culturelles numérisées d’un musée, où elle démontre son efficacité pour l’exploration et la valorisation de ces contenus à différents niveaux, depuis leur archivage jusqu’à leur exposition ou ex situ. Ensuite, nous explorons l’application de la localisation basée image entre domaines, où la pose d’une image est estimée à partir d’images géoréférencées, en retrouvant des images géolocalisées visuellement similaires à la requête / Content-Based Image Retrieval (CBIR) is a discipline of Computer Science which aims at automatically structuring image collections according to some visual criteria. The offered functionalities include the efficient access to images in a large database of images, or the identification of their content through object detection and recognition tools. They impact a large range of fields which manipulate this kind of data, such as multimedia, culture, security, health, scientific research, etc.To index an image from its visual content first requires producing a visual summary of this content for a given use, which will be the index of this image in the database. From now on, the literature on image descriptors is very rich; several families of descriptors exist and in each family, a lot of approaches live together. Many descriptors do not describe the same information and do not have the same properties. Therefore it is relevant to combine some of them to better describe the image content. The combination can be implemented differently according to the involved descriptors and to the application. In this thesis, we focus on the family of local descriptors, with application to image and object retrieval by example in a collection of images. Their nice properties make them very popular for retrieval, recognition and categorization of objects and scenes. Two directions of research are investigated:Feature combination applied to query-by-example image retrieval: the core of the thesis rests on the proposal of a model for combining low-level and generic descriptors in order to obtain a descriptor richer and adapted to a given use case while maintaining genericity in order to be able to index different types of visual contents. The considered application being query-by-example, another major difficulty is the complexity of the proposal, which has to meet with reduced retrieval times, even with large datasets. To meet these goals, we propose an approach based on the fusion of inverted indices, which allows to represent the content better while being associated with an efficient access method.Complementarity of the descriptors: We focus on the evaluation of the complementarity of existing local descriptors by proposing statistical criteria of analysis of their spatial distribution. This work allows highlighting a synergy between some of these techniques when judged sufficiently complementary. The spatial criteria are employed within a regression-based prediction model which has the advantage of selecting the suitable feature combinations globally for a dataset but most importantly for each image. The approach is evaluated within the fusion of inverted indices search engine, where it shows its relevance and also highlights that the optimal combination of features may vary from an image to another.Additionally, we exploit the previous two proposals to address the problem of cross-domain image retrieval, where the images are matched across different domains, including multi-source and multi-date contents. Two applications of cross-domain matching are explored. First, cross-domain image retrieval is applied to the digitized cultural photographic collections of a museum, where it demonstrates its effectiveness for the exploration and promotion of these contents at different levels from their archiving up to their exhibition in or ex-situ. Second, we explore the application of cross-domain image localization, where the pose of a landmark is estimated by retrieving visually similar geo-referenced images to the query images Recherche d’image par contenu Combinaison de caractéristiques Sac de mots Index inversé Complémentarité spatiale Recherche d’images inter-Domaines Content-Based image retrieval Feature combination Bag-Of-Features Inverted index Spatial complementarity Cross-Domain image retrieval
15	Partial 3D-shape indexing and retrieval / Indexation partielle de modèles 3D El Khoury, Rachid 22 March 2013 (has links) Un nombre croissant d’applications graphiques 3D ont un impact sur notre société. Ces applications sont utilisées dans plusieurs domaines allant des produits de divertissement numérique, la conception assistée par ordinateur, aux applications médicales. Dans ce contexte, un moteur de recherche d’objets 3D avec de bonnes performances en résultats et en temps d’exécution devient indispensable. Nous proposons une nouvelle méthode pour l’indexation de modèles 3D basée sur des courbes fermées. Nous proposons ensuite une amélioration de notre méthode pour l’indexation partielle de modèles 3D. Notre approche commence par la définition d’une nouvelle fonction d’application invariante. Notre fonction d’application possède des propriétés importantes : elle est invariante aux transformations rigides et non rigides, elle est insensible au bruit, elle est robuste à de petits changements topologiques et elle ne dépend pas de paramètres. Cependant, dans la littérature, une telle fonction qui respecte toutes ces propriétés n’existe pas. Pour respecter ces propriétés, nous définissons notre fonction basée sur la distance de diffusion et la distance de migration pendulaire. Pour prouver les propriétés de notre fonction, nous calculons le graphe de Reeb de modèles 3D. Pour décrire un modèle 3D complet, en utilisant notre fonction d’application, nous définissons des courbes de niveaux fermées à partir d’un point source détecté automatiquement au centre du modèle 3D. Chaque courbe décrit alors une région du modèle 3D. Ces courbes créent un descripteur invariant à différentes transformations. Pour montrer la robustesse de notre méthode sur différentes classes de modèles 3D dans différentes poses, nous utilisons des objets provenant de SHREC 2012. Nous comparons également notre approche aux méthodes de l’état de l’art à l’aide de la base SHREC 2010. Pour l’indexation partielle de modèles 3D, nous améliorons notre approche en utilisant la technique sacs de mots, construits à partir des courbes fermées extraites, et montrons leurs bonnes performances à l’aide de la base précédente / A growing number of 3D graphic applications have an impact on today’s society. These applications are being used in several domains ranging from digital entertainment, computer aided design, to medical applications. In this context, a 3D object search engine with a good performance in time consuming and results becomes mandatory. We propose a novel approach for 3D-model retrieval based on closed curves. Then we enhance our method to handle partial 3D-model retrieval. Our method starts by the definition of an invariant mapping function. The important properties of a mapping function are its invariance to rigid and non rigid transformations, the correct description of the 3D-model, its insensitivity to noise, its robustness to topology changes, and its independance on parameters. However, current state-of-the-art methods do not respect all these properties. To respect these properties, we define our mapping function based on the diffusion and the commute-time distances. To prove the properties of this function, we compute the Reeb graph of the 3D-models. To describe the whole 3D-model, using our mapping function, we generate indexed closed curves from a source point detected automatically at the center of a 3D-model. Each curve describes a small region of the 3D-model. These curves lead to create an invariant descriptor to different transformations. To show the robustness of our method on various classes of 3D-models with different poses, we use shapes from SHREC 2012. We also compare our approach to existing methods in the state-of-the-art with a dataset from SHREC 2010. For partial 3D-model retrieval, we enhance the proposed method using the Bag-Of-Features built with all the extracted closed curves, and show the accurate performances using the same dataset Modèles 3D Noyau de la chaleur Distance de diffusion Distance de migration pendulaire Graphes de Reeb Indexation Indexation partielle Sacs de mots 3D-models Heat kernel Diffusion distance Commute time distance Reeb graphs Retrieval Partial retrieval Bag-of-features
16	Automatické třídění fotografií podle obsahu / Automatic Photography Categorization Gajová, Veronika January 2012 (has links) Purpose of this thesis is to design and implement a tool for automatic categorization of photos. The proposed tool is based on the Bag of Words classification method and it is realized as a plug-in for the XnView image viewer. The plug-in is able to classify a selected group of photos into predefined image categories. Subsequent notation of image categories is written directly into IPTC metadata of the picture as a keyword.
17	Koncepty strojového učení pro kategorizaci objektů v obrazu / Machine Learning Concepts for Categorization of Objects in Images Hubený, Marek January 2017 (has links) This work is focused on objects and scenes recognition using machine learning and computer vision tools. Before the solution of this problem has been studied basic phases of the machine learning concept and statistical models with accent on their division into discriminative and generative method. Further, the Bag-of-words method and its modification have been investigated and described. In the practical part of this work, the implementation of the Bag-of-words method with the SVM classifier was created in the Matlab environment and the model was tested on various sets of publicly available images.

Page generated in 0.0641 seconds