Global ETD Search

71	Sur la définition et la reconnaissance des formes planes dans les images numériques Musé, Pablo 01 October 2004 (has links) (PDF) Cette thèse traite de la reconnaissance des formes dans les images numériques. Une représentation appropriée des formes est déduite de l'analyse des perturbations qui n'affectent pas la reconnaissance : changement de contraste, occlusion partielle, bruit, perspective. Les atomes de cette représentation, appelés "éléments de forme", fournissent des descriptions semi-locales des formes. L'appariement de ces éléments permet de reconnaitre des formes partielles. Les formes globales sont alors définies comme des groupes de formes partielles présentant une cohérence dans leur disposition spatiale. L'aspect fondamental de ce travail est la mise en place de seuils non-supervisés, à tous les niveaux de décision du processus de reconnaissance. Nous proposons des règles de décision pour la en correcpondance de formes partielles ainsi que pour la détection de formes globales. Le cadre proposé est basé sur une méthodologie générale de la détection dans laquelle un événement est significatif s'il n'est pas susceptible d'arriver par hasard. [MATH] Mathematics reconnaissance de formes lignes de niveau élément de forme normalisation modèle de fond nombre de fausses alarmes détection a contrario classification non-supervisée groupement de formes
72	Contribution à la comparaison de séquences d'images couleur par outils statistiques et par outils issus de la théorie algorithmique de l'information Leclercq, Thomas Macaire, Ludovic Delahaye, Jean-Paul Khoudour, Louahdi. January 2007 (has links) Reproduction de : Thèse de doctorat : Automatique et Informatique industrielle : Lille 1 : 2006. / N° d'ordre (Lille 1) : 3940. Résumé en français et en anglais. Titre provenant de la page de titre du document numérisé. Bibliogr. p. [191]-201. Liste des publications.
73	Restauration des images par l'elimination du flou et des occlusions Whyte, Oliver 15 March 2012 (has links) (PDF) This thesis investigates the removal of spatially-variant blur from photographs degraded by camera shake, and the removal of large occluding objects from photographs of popular places. We examine these problems in the case where the photographs are taken with standard consumer cameras, and we have no particular information about the scene being photographed. Most existing deblurring methods model the observed blurry image as the convolution of a sharp image with a uniform blur kernel. However, we show that blur from camera shake is in general mostly due to the 3D rotation of the camera, resulting in a blur that can be significantly non-uniform across the image. We model this blur using a weighted set of camera poses, which induce homographies on the image being captured. The blur in a particular image is parameterised by the set of weights, which provides a compact global descriptor for the blur, analogous to a convolution kernel. This descriptor fully captures the spatially-variant blur at all pixels, and is able to model camera shake more accurately than previous methods. We demonstrate direct estimation of the blur weights from single and multiple blurry images captured by conventional cameras. This permits a sharp image to be recovered from a blurry "shaken" image without any user interaction or additional infor- mation about the camera motion. For single image deblurring, we adapt an existing marginalisation-based algorithm and a maximum a posteriori-based algorithm, which are both compatible with our model of spatially-variant blur. In order to reduce the computational cost of our homography-based model, we introduce an efficient approximation based on local-uniformity of the blur. By grouping pixels into local regions which share a single PSF, we are able to take advantage of fast, frequency domain convolutions to perform the blur computation. We apply this approximation to single image deblurring, obtaining an order of magnitude reduction in computation time with no visible reduction in quality. For deblurring images with saturated pixels, we propose a modification of the forward model to include this non-linearity, and re-derive the Richardson-Lucy algorithm with this new model. To prevent ringing artefacts from propagating in the deblurred image, we propose separate updates for those pixels affected by saturation, and those not affected. This prevents the loss of information caused by clipping from propagating to the rest of the image. In order to remove large occluders from photos, we automatically retrieve a set of exemplar images of the same scene from the Internet, using a visual search engine. We extract multiple homographies between each of these images and the target image to provide pixel correspondences. Finally we combine pixels from several exemplars in a seamless manner to replace the occluded pixels, by solving an energy minimisation problem on a conditional random field. Experimental results are shown on both synthetic images and real photographs captured by consumer cameras or downloaded from the Internet. computer vision deblurring
74	Statistiques Supervisées pour la Reconnaissance d'Actions Humaines dans les Vidéos Muneeb Ullah, Muhammad 23 October 2012 (has links) (PDF) This thesis addresses the problem of human action recognition in realistic video data, such as movies and online videos. Automatic and accurate recognition of human actions in video is a fascinating capability. The potential applications range from surveillance and robotics to medical diagnosis, content-based video retrieval, and intelligent human- computer interfaces. The task is highly challenging due to the large variations in person appearances, dynamic backgrounds, view-point changes, lighting conditions, action styles and other factors. Statistical video representations based on local space-time features have been recently shown successful for action recognition in realistic scenarios. Their success can be at- tributed to the mild assumptions about the data and robustness to several variations in the video. Such representations, however, often encode videos by disordered collection of low-level primitives. This thesis extends current methods by developing more discrimi- native features and integrating additional supervision into Bag-of-Features based video representations, aiming to improve action recognition in unconstrained and challenging video data. We start by evaluating a range of available local space-time feature detectors and descriptors under the standard Bag-of-Features framework. We then propose to improve the basic Bag-of-Features model by integrating additional supervision in the form of non-local region-level information. We further investigate an attribute-based representation, wherein the attributes range from objects (e.g., car, chair, table, etc.) to human poses and actions. We demonstrate that such representation captures high-level information in video, and provides complementary information to the low-level features. We finally propose a novel local representation for human action recognition in video, denoted as Actlets. Actlets are body part detectors undergoing characteristic motion patterns. We train Actlets using a large synthetic video dataset of rendered avatars and demonstrate the advantages of Actlets for action recognition in realistic data. All methods proposed and developed in this thesis represent alternative ways of construct- ing supervised video representations and demonstrate improvements of human action recognition in realistic settings. computer vision action recognition
75	Optimization convexe pour cosegmentation Joulin, Armand 17 December 2012 (has links) (PDF) Les hommes et la plupart des animaux ont une capacité naturelle à voir le monde et à le comprendre sans effort. La simplicité apparente avec laquelle un humain perçoit ce qui l'entoure suggère que le processus impliqué ne nécessite pas, dans une certaine mesure, un haut degré de réflexion. Cette observation suggère que notre perception visuelle du monde peut être simulée sur un ordinateur. La vision par ordinateur est le domaine de la recherche consacré au problème de la création d'une forme de perception visuelle pour des ordinateurs. Les premiers travaux dans ce domaine remontent aux années cinquante, mais la puissance de calcul des ordinateurs de cette époque ne permettait pas de traiter et d'analyser les données visuelles nécessaires à l'elaboration d'une perception visuelle virtuelle. Ce n'est que récemment que la puissance de calcul et la capacité de stockage ont permis à ce domaine de vrai- ment émerger. Depuis maintenant deux décennies, la vision par ordinateur a permis de répondre à problèmes pratiques ou industrielles comme par exemple, la détection des visages, de personnes au comportement suspect dans une foule ou de défauts de fabrication dans des chaînes de production. En revanche, en ce qui concerne l'émergence d'une perception visuelle virtuelle non spécifique à une tâche donnée, peu de progrès ont été réalisés et la communauté est toujours confrontée à des problèmes fondamentaux. Un de ces problèmes est de segmenter une image ou une video en régions porteuses de sens, ou en d'autres termes, en objets ou actions. La segmentation de scène est non seulement naturelle pour les humains, mais aussi essentielle pour comprendre pleinement son environnement. Malheureusement elle est aussi extrêmement difficile à reproduire sur un ordinateur. Une des raisons est qu'il n'existe pas de définition claire de ce qu'est une région "significative". En effet, en fonction de la scène ou de la situation, une région peut avoir des interprétations différentes. Par exemple, étant donnée une scène se passant dans la rue, on peut considérer que distinguer un piéton est important dans cette situation, par contre ses vêtements ne le semblent pas nécessairement. Si maintenant nous considérons une scène ayant lieu pendant un défilé de mode, un vêtement devient un élément important, donc une région significative. Dans cette thèse, nous nous concentrons sur ce problème de segmentation et nous l'abordons sous un angle particulier afin d'éviter cette difficulté fondamentale. Nous allons considérer la segmentation comme un problème d'apprentissage faible- ment supervisé, c'est-à-dire qu'au lieu de segmenter des images selon une certaine définition prédéfinie de régions "significatives", nous développons des méthodes per- mettant de segmenter simultanément un ensemble d'images en régions qui apparais- sent régulièrement. En d'autres termes, nous définissons une région "significative" d'un point de vue statistique: Ce sont les régions qui apparaissent régulièrement dans l'ensemble des images données. Pour cela nous concevons des modèles ayant une portée qui va au-delà de l'application à la vision. Notre approche prend ses racines dans l'apprentissage statistique, dont l'objectif est de concevoir des méthodes efficaces pour extraire et/ou apprendre des motifs récurrents dans des jeux de données. Ce domaine a récemment connu une forte popularité en raison de l'augmentation du nombre, de la taille des bases de données disponibles et la nécessité de traiter les données automatiquement. Dans cette thèse, nous nous concentrons sur des méthodes conçues pour découvrir l'information "cachée" dans une base de données à partir d'annotations incomplètes ou inexistantes. Enfin, nos travaux prennent aussi racines dans le domaine de l'optimisation numérique afin d'élaborer des algorithmes efficaces et adaptés spécialement à nos prob- lèmes. En particulier, nous utilisons et adaptons des outils récemment développés afin de relaxer des problèmes combinatoires complexes en des problèmes convexes pour lesquels il est garanti de trouver la solution optimale à l'aide de procedures developpees en optimisation convexe. Nous illustrons la qualité de nos formulations et algorithmes aussi sur des problèmes tirés de domaines autres que la vision par ordinateur. En particulier, nous montrons que nos travaux peuvent être utilisés dans la classification de texte et en biologie cellulaire. computer vision object recognition cosegmentation
76	Alignement élastique d'images pour la reconnaissance d'objet Duchenne, Olivier 29 November 2012 (has links) (PDF) The objective of this thesis is to explore the use of graph matching in object recognition systems. In the continuity of the previously described articles, rather than using descriptors invariant to misalignment, this work directly tries to find explicit correspondences between prototypes and test images, in order to build a robust similarity measure and infer the class of the test images. In chapter 2, we will present a method that given interest points in two images tries to find correspondences between them. It extends previous graph matching approaches [Leordeanu and Hebert, 2005a] to handle interactions between more than two feature correspondences. This allows us to build a more discriminative and/or more invariant matching method. The main contributions of this chapter are: The introduction of an high-order objective function for hyper-graph matching (Section 2.3.1). The application of the tensor power iteration method to the high-order matching task, combined with a relaxation based on constraints on the row norms of assignment matrices, which is tighter than previous methods (Section 2.3.1). An l1-norm instead of the classical l2-norm relaxation, that provides solutions that are more interpretable but still allows an efficient power iteration algorithm (Section 2.3.5). The design of appropriate similarity measures that can be chosen either to improve the invariance of matching, or to improve the expressivity of the model (Section 2.3.6). The proposed approach has been implemented, and it is compared to stateof-the-art algorithms on both synthetic and real data. As shown by our experiments (Section 2.5), our implementation is, overall, as fast as these methods in spite of the higher complexity of the model, with better accuracy on standard databases. In chapter 3, we build a graph-matching method for object categorization. The main contributions of this chapter are: Generalizing [Caputo and Jie, 2009; Wallraven et al., 2003], we propose in Section 3.3 to use the optimum value of the graph-matching problem associated with two images as a (non positive definite) kernel, suitable for SVM classification. We propose in Section 3.4 a novel extension of Ishikawa's method [Ishikawa, 2003] for optimizing MRFs which is orders of magnitude faster than competing algorithms (e.g., [Kim and Grauman, 2010; Kolmogorov and Zabih, 2004; Leordeanu and Hebert, 2005a]) for the grids with a few hundred nodes considered in this article). In turn, this allows us to combine our kernel with SVMs in image classification tasks. We demonstrate in Section 3.5 through experiments with standard benchmarks (Caltech 101, Caltech 256, and Scenes datasets) that our method matches and in some cases exceeds the state of the art for methods using a single type of features. In chapter 4, we introduce our work about object detection that perform fast image alignment. The main contributions of this chapter are: We propose a novel image similarity measure that allows for arbitrary deformations of the image pattern within some given disparity range and can be evaluated very efficiently [Lemire, 2006], with a cost equal to a small constant times that of correlation in a sliding-window mode. Our similarity measure relies on a hierarchical notion of parts based on simple rectangular image primitives and HOG cells [Dalal and Triggs, 2005a], and does not require manual part specification [Felzenszwalb and Huttenlocher, 2005b; Bourdev and Malik, 2009; Felzenszwalb et al., 2010] or automated discovery [Lazebnik et al., 2005; Kushal et al., 2007]. computer vision object recognition image matching
77	Learning Hierarchical Feature Extractors For Image Recognition Boureau, Y-Lan 01 September 2012 (has links) (PDF) Telling cow from sheep is effortless for most animals, but requires much engineering for computers. In this thesis, we seek to tease out basic principles that underlie many recent advances in image recognition. First, we recast many methods into a common unsu- pervised feature extraction framework based on an alternation of coding steps, which encode the input by comparing it with a collection of reference patterns, and pooling steps, which compute an aggregation statistic summarizing the codes within some re- gion of interest of the image. Within that framework, we conduct extensive comparative evaluations of many coding or pooling operators proposed in the literature. Our results demonstrate a robust superiority of sparse coding (which decomposes an input as a linear combination of a few visual words) and max pooling (which summarizes a set of inputs by their maximum value). We also propose macrofeatures, which import into the popu- lar spatial pyramid framework the joint encoding of nearby features commonly practiced in neural networks, and obtain significantly improved image recognition performance. Next, we analyze the statistical properties of max pooling that underlie its better perfor- mance, through a simple theoretical model of feature activation. We then present results of experiments that confirm many predictions of the model. Beyond the pooling oper- ator itself, an important parameter is the set of pools over which the summary statistic is computed. We propose locality in feature configuration space as a natural criterion for devising better pools. Finally, we propose ways to make coding faster and more powerful through fast convolutional feedforward architectures, and examine how to incorporate supervision into feature extraction schemes. Overall, our experiments offer insights into what makes current systems work so well, and state-of-the-art results on several image recognition benchmarks. computer vision object recognition feature extraction
78	Modeling and visual recognition of human actions and interactions Laptev, Ivan 03 July 2013 (has links) (PDF) This work addresses the problem of recognizing actions and interactions in realistic video settings such as movies and consumer videos. The first contribution of this thesis (Chapters 2 and 4) is concerned with new video representations for action recognition. We introduce local space-time descriptors and demonstrate their potential to classify and localize actions in complex settings while circumventing the difficult intermediate steps of person detection, tracking and human pose estimation. The material on bag-of-features action recognition in Chapter 2 is based on publications [L14, L22, L23] and is related to other work by the author [L6, L7, L8, L11, L12, L13, L16, L21]. The work on object and action localization in Chapter 4 is based on [L9, L10, L13, L15] and relates to [L1, L17, L19, L20]. The second contribution of this thesis is concerned with weakly-supervised action learning. Chap- ter 3 introduces methods for automatic annotation of action samples in video using readily-available video scripts. It addresses the ambiguity of action expressions in text and the uncertainty of tem- poral action localization provided by scripts. The material presented in Chapter 3 is based on publications [L4, L14, L18]. Finally Chapter 5 addresses interactions of people with objects and concerns modeling and recognition of object function. We exploit relations between objects and co-occurring human poses and demonstrate object recognition improvements using automatic pose estimation in challenging videos from YouTube. This part of the thesis is based on the publica- tion [L2] and relates to other work by the author [L3, L5]. computer vision action recognition video analysis
79	Évaluation de système biométrique El Abed, Mohamad 09 December 2011 (has links) (PDF) Les systèmes biométriques sont de plus en plus utilisés pour vérifier ou déterminer l'identité d'un individu. Compte tenu des enjeux liés à leur utilisation, notamment pour des applications dans le domaine de commerce électronique, il est particulièrement important de disposer d'une méthodologie d'évaluation de tels systèmes. Le problème traité dans cette thèse réside dans la conception d'une méthodologie générique visant à évaluer un système biométrique. Trois méthodes ont été proposées dans cette thèse: 1) une méthode de qualité sans référence pour prédire la qualité d'une donnée biométrique, 2) une méthode d'usage pour évaluer l'acceptabilité et la satisfaction des usagers lors de l'utilisation des systèmes biométriques et 3) une méthode d'analyse sécuritaire d'un système biométrique afin de mesurer sa robustesse aux attaques EVALUATION RECONNAISSANCE DE FORMES (INFORMATIQUE) TRAITEMENT D'IMAGES TECHNIQUES NUMERIQUES CLASSIFICATION
80	Fusion multimodale pour la reconnaissance d'espèces d'arbres / Multimodal fusion for leaf species recognition Ben Ameur, Rihab 04 June 2018 (has links) Les systèmes de fusion d’informations permettent de combiner des données issues de différentes sources d’informations tout en tenant compte de leur qualité. La combinaison de données issues de sources hétérogènes permet de profiter de la complémentarité des données et donc d’avoir potentiellement des performances plus élevées que celles obtenues en utilisant une seule source d’informations. L’utilisation de ces systèmes s’avère intéressante dans le cadre de la reconnaissance d’espèces d’arbres à travers la fusion d’informations issues de deux modalités : les feuilles et les écorces.Une seule modalité représente éventuellement différentes sources d’informations décrivant chacune une des caractéristiques les plus pertinentes. Ceci permet de reproduire la stratégie adoptée par les botanistes qui se basent sur ces même critères lors de la reconnaissance. L’adoption de cette stratégie entre dans la mise en valeur de l’aspect éducatif. Dans ce cadre, un système de fusion est envisageable afin de combiner les données issues d’une même modalité ainsi que les différentes modalités disponibles. Dans le contexte de la reconnaissance d’espèces d’arbres, il s’agit d’un problème réel où les photos des feuilles et des écorces sont prises en milieu naturel. Le traitement de ce type de données est compliqué vue leurs spécificités dues d’une part à la nature des objets à reconnaître (âge, similarité inter-espèces et variabilité intra-espèce) et d’autre part à l’environnement.Des erreurs peuvent s’accumuler tout au long du processus précédant la fusion. L’intérêt de la fusion est de prendre en compte toutes les imperfections pouvant entacher les données disponibles et essayer de bien les modéliser. La fusion est d’autant plus efficace que les données sont bien modélisées. La théorie des fonctions de croyance représente l’un des cadres théoriques les plus aptes à gérer et représenter l’incertitude, l’imprécision, le conflit, etc. Cette théorie tire son importance de sa richesse en termes d’outils permettant de gérer les différentes sources d’imperfections ainsi que les spécificités des données disponibles. Dans le cadre de cette théorie, il est possible de modéliser les données à travers la construction de fonctions de masse. Il est également possible de gérer la complexité calculatoire grâce aux approximations permettant de réduire le nombre d’éléments focaux. Le conflit étant l’une des sources d’imperfections les plus présentes, peut être traité à travers la sélection de la règle de combinaison la mieux adaptée.En fusionnant des sources d’informations ayant des degrés de fiabilité différents, il est possible que la source la moins fiable affecte les données issues de la source la plus fiable. Une des solutions pour ce problème est de chercher à améliorer les performances de la source la moins fiable. Ainsi, en la fusionnant avec d’autres sources, elle apportera des informations utiles et contribuera à son tour à l’amélioration des performances du système de fusion. L’amélioration des performances d’une source d’informations peut s’effectuer à travers la correction des fonctions de masse. Dans ce cadre, la correction peut se faire en se basant sur des mesures de la pertinence ou de la sincérité de la source étudiée. Les matrices de confusion présentent une source de données à partir desquelles des méta-connaissances caractérisant l’état d’une source peuvent être extraites.Dans ce manuscrit, le système de fusion proposé est un système de fusion hiérarchique mis en place dans le cadre de la théorie des fonctions de croyance. Il permet de fusionner les données issues des feuilles et des écorces et propose à l’utilisateur une liste des espèces les plus probables tout en respectant l’objectif éducatif de l’application. La complexité calculatoire de ce système de fusion est assez réduite permettant, à long termes, d’implémenter l’application sur un Smart-phone. / Information fusion systems allow the combination of data issued from different sources of information while considering their quality. Combining data from heterogeneous sources makes it possible to take advantage of the complementarity of the data and thus potentially have higher performances than those obtained when using a single source of information.The use of these systems is interesting in the context of tree species recognition through the fusion of information issued from two modalities : leaves and barks. A single modality may represent different sources of information, each describing one of its most relevant characteristics. This makes it possible to reproduce the strategy adopted by botanists who base themselves on these same criteria. The adoption of this strategy is part of the enhancement of the educational aspect. In this context, a merger system is conceivable in order to combine the data issued from one modality as well as the data issued from different modalities. In the context of tree species recognition, we treat a real problem since the photos of leaves and bark are taken in the natural environment. The processing of this type of data is complicated because of their specificities due firstly to the nature of the objects to be recognized (age, inter-species similarity and intra-species variability) and secondly to the environment.Errors can be accumulated during the pre-fusion process. The merit of the fusion is to take into account all the imperfections that can taint the available data and try to model them well. The fusion is more effective if the data is well modeled. The theory of belief functions represents one of the best theoretical frameworks able to manage and represent uncertainty, inaccuracy, conflict, etc. This theory is important because of its wealth of tools to manage the various sources of imperfections as well as the specificities of the available data. In the framework of this theory, it is possible to model the data through the construction of mass functions. It is also possible to manage the computational complexity thanks to the approximations allowing to reduce the number of focal elements. Conflict being one of the most present sources of imperfections, can be dealt through the selection of the best combination rule.By merging sources of information with different degrees of reliability, it is possible that the least reliable source affects the data issued from the most reliable one. One of the solutions for this problem is to try to improve the performances of the least reliable source. Thus, by merging with other sources, it will provide useful information and will in turn contribute in improving the performance of the fusion system.The performance improvement of an information source can be effected through the correction of mass functions. In this context, the correction can be made based on measures of the relevance or sincerity of the studied source. The confusion matrices present a data source from which meta-knowledge characterizing the state of a source can be extracted. In this manuscript, the proposed fusion system is a hierarchical fusion system set up within the framework of belief function theory. It allows to merge data from leaves and barks and provides the user with a list of the most likely species while respecting the educational purpose of the application. The computational complexity of this fusion system is quite small allowing, in the long term, to implement the application on a Smart-phone. Fusion d'informations Reconnaissance de formes Classification Multi modalités Théorie des fonctions de croyance Information fusion Pattern recognition Classification Multimodalities Belief function theory 004

Search results