Global ETD Search

11	Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification Alqasrawi, Yousef T. N., Neagu, Daniel, Cowling, Peter I. January 2013 (has links) No / The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model. Image classification Natural scenes Bag of visual words Integrated visual vocabulary Pyramidal colour moments Feature fusion Semantic modelling Local descriptors Retrieval Categorization Codebooks Representation Recognition Semantics Object
12	Contributions to facial feature extraction for face recognition / Contributions à l'extraction de caractéristiques pour la reconnaissance de visages Nguyen, Huu-Tuan 19 September 2014 (has links) La tâche la plus délicate d'un système de reconnaissance faciale est la phase d'extraction de caractéristiques significatives et discriminantes. Dans le cadre de cette thèse, nous nous sommes focalisés sur cette tâche avec comme objectif l'élaboration d'une représentation de visage robuste aux variations majeures suivantes: variations d'éclairage, de pose, de temps, images de qualité différentes (vidéosurveillance). Par ailleurs, nous avons travaillé également dans une optique de traitement temps réel. Tout d'abord, en tenant compte des caractéristiques d'orientation des traits principaux du visages (yeux, bouche), une nouvelle variante nommée ELBP de célèbre descripteur LBP a été proposée. Elle s'appuie sur les informations de micro-texture contenues dans une ellipse horizontale. Ensuite, le descripteur EPOEM est construit afin de tenir compte des informations d'orientation des contours. Puis un descripteur nommée PLPQMC qui intégre des informations obtenues par filtrage monogénique dans le descripteur LPQ est proposé. Enfin le descripteur LPOG intégrant des informations de gradient est présenté. Chacun des descripteurs proposés est testé sur les 3 bases d'images AR, FERET et SCface. Il en résulte que les descripteurs PLPQMC et LPOG sont les plus performants et conduisent à des taux de reconnaissance comparables voire supérieur à ceux des meilleurs méthodes de l'état de l'art. / Centered around feature extraction, the core task of any Face recognition system, our objective is devising a robust facial representation against major challenges, such as variations of illumination, pose and time-lapse and low resolution probe images, to name a few. Besides, fast processing speed is another crucial criterion. Towards these ends, several methods have been proposed through out this thesis. Firstly, based on the orientation characteristics of the facial information and important features, like the eyes and mouth, a novel variant of LBP, referred as ELBP, is designed for encoding micro patterns with the usage of an horizontal ellipse sample. Secondly, ELBP is exploited to extract local features from oriented edge magnitudes images. By this, the Elliptical Patterns of Oriented Edge Magnitudes (EPOEM) description is built. Thirdly, we propose a novel feature extraction method so called Patch based Local Phase Quantization of Monogenic components (PLPQMC). Lastly, a robust facial representation namely Local Patterns of Gradients (LPOG) is developed to capture meaningful features directly from gradient images. Chiefs among these methods are PLPQMC and LPOG as they are per se illumination invariant and blur tolerant. Impressively, our methods, while offering comparable or almost higher results than that of existing systems, have low computational cost and are thus feasible to deploy in real life applications. Reconnaissance de visages robuste Descripteurs locaux Extraction de caractéristiques ELBP LPQ EPOEM PLPQMC LPOG Feature extraction for face recognition Local descriptors Local features ELBP Patch based LPQ Monogenic filter based EPOEM LPOG 620
13	Indexation et recherche de contenus par objet visuel / Object-based visual content indexing and retrieval Bursuc, Andrei 21 December 2012 (has links) La question de recherche des objets vidéo basés sur le contenu lui-même, est de plus en plus difficile et devient un élément obligatoire pour les moteurs de recherche vidéo. Cette thèse présente un cadre pour la recherche des objets vidéo définis par l'utilisateur et apporte deux grandes contributions. La première contribution, intitulée DOOR (Dynamic Object Oriented Retrieval), est un cadre méthodologique pour la recherche et récupération des instances d'objets vidéo sélectionnés par un utilisateur, tandis que la seconde contribution concerne le support offert pour la recherche des vidéos, à savoir la navigation dans les vidéo, le système de récupération de vidéos et l'interface avec son architecture sous-jacente.Dans le cadre DOOR, l’objet comporte une représentation hybride obtenues par une sur-segmentation des images, consolidé avec la construction des graphs d’adjacence et avec l’agrégation des points d'intérêt. L'identification des instances d'objets à travers plusieurs vidéos est formulée comme un problème d’optimisation de l'énergie qui peut approximer un tache NP-difficile. Les objets candidats sont des sous-graphes qui rendent une énergie optimale vers la requête définie par l'utilisateur. Quatre stratégies d'optimisation sont proposées: Greedy, Greedy relâché, recuit simulé et GraphCut. La représentation de l'objet est encore améliorée par l'agrégation des points d'intérêt dans la représentation hybride, où la mesure de similarité repose sur une technique spectrale intégrant plusieurs types des descripteurs. Le cadre DOOR est capable de s’adapter à des archives vidéo a grande échelle grâce à l'utilisation de représentation sac-de-mots, enrichi avec un algorithme de définition et d’expansion de la requête basée sur une approche multimodale, texte, image et vidéo. Les techniques proposées sont évaluées sur plusieurs corpora de test TRECVID et qui prouvent leur efficacité.La deuxième contribution, OVIDIUS (On-line VIDeo Indexing Universal System) est une plate-forme en ligne pour la navigation et récupération des vidéos, intégrant le cadre DOOR. Les contributions de cette plat-forme portent sur le support assuré aux utilisateurs pour la recherche vidéo - navigation et récupération des vidéos, interface graphique. La plate-forme OVIDIUS dispose des fonctionnalités de navigation hiérarchique qui exploite la norme MPEG-7 pour la description structurelle du contenu vidéo. L'avantage majeur de l'architecture propose c’est sa structure modulaire qui permet de déployer le système sur terminaux différents (fixes et mobiles), indépendamment des systèmes d'exploitation impliqués. Le choix des technologies employées pour chacun des modules composant de la plate-forme est argumentée par rapport aux d'autres options technologiques. / With the ever increasing amount of available video content on video repositories the issue of content-based video objects retrieval is growing in difficulty and becomes a mandatory feature for video search engines.The present thesis advances a user defined video object retrieval framework and brings two major contributions. The first contribution is a methodological framework for user selected video object instances retrieval, entitled DOOR (Dynamic Object Oriented Retrieval), while the second one concerns the support offered for video retrieval, namely the video navigation and retrieval system and interface and its underlying architecture.Under the DOOR framework, the user defined video object comports a hybrid representation obtained by over-segmenting the frames, constructing region adjacency graphs and aggregating interest points. The identification of object instances across multiple videos is formulated as an energy optimization problem approximating an NP-hard problem. Object candidates are sub-graphs that yield an optimum energy towards the user defined query. In order to obtain the optimum energy four optimization strategies are proposed: Greedy, Relaxed Greedy, Simulated Annealing and GraphCut. The region-based object representation is further improved by the aggregation of interest points into a hybrid object representation. The similarity between an object and a frame is achieved with the help of a spectral matching technique integrating both colorimetric and interest points descriptors.The DOOR framework is suitable to large scale video archives through the use of a Bag-of-Words representation enriched with a query definition and expansion mechanism based on a multi-modal, text-image-video principle.The performances of the proposed techniques are evaluated on multiple TRECVID video datasets prooving their effectiveness.The second contribution is related to the user support for video retrieval - video navigation, video retrieval, graphical interface - and consists in the OVIDIUS (On-line VIDeo Indexing Universal System) on-line video browsing and retrieval platform. The OVIDIUS platform features hierarchical video navigation functionalities that exploit the MPEG-7 approach for structural description of video content. The DOOR framework is integrated in the OVIDIUS platform, ensuring the search functionalities of the system. The major advantage of the proposed system concerns its modular architecture which makes it possible to deploy the system on various terminals (both fixed and mobile), independently of the exploitation systems involved. The choice of the technologies employed for each composing module of the platform is argumented in comparison with other technological options. Finally different scenarios and use cases for the OVIDIUS platform are presented. Indexation basée sur le contenu Récupération d'objets Services web Contenu multimédia Mpeg-7 Descripteurs locaux Indexation multimédia Représentation d'objet Minimization d'énergie Greedy Recuit simulé MPEG-7 GraphCut Sac de mots Extension de requête Appariement de graphes Recherche multimodale TRECVID Plateforme d'indexation multimédia Navigation de vidéos HTML5 Accès multi-terminal Content-based indexing Object retrieval Web services Multimedia content Mpeg-7 Local descriptors Greedy MPEG-7 GraphCut TRECVID HTML5

Page generated in 0.0496 seconds