Global ETD Search

91	Distortion correction for non-planar deformable projection displays through homography shaping and projected image warping Kio, Onoise Gerald January 2016 (has links) Video projectors have advanced from being tools for only delivering presentations on flat or planar surfaces to tools for delivering media content in such applications as augmented reality, simulated sports practice and invisible displays. With the use of non-planar surfaces for projection comes geometric and radiometric distortions. This work dwells on correcting geometric distortions occurring when images or video frames are projected onto static and deformable non-planar display surfaces. The distortion-correction process involves (i) detecting feature points from the camera images and creating a desired shape of the undistorted view through a 2D homography, (ii) transforming the feature points on the camera images to control points on the projected images, (iii) calculating Radial Basis Function (RBF) warping coefficients from the control points, and warping the projected image to obtain an undistorted image of the projection on the projection surface. Several novel aspects of this work have emerged and include (i) developing a theoretical framework that explains the cause of distortion and provides a general warping pattern to be applied to the projection, (ii) carrying out the distortion-correction process without the use of a distortion-measuring calibration image or structured light pattern, (iii) carrying out the distortioncorrection process on a projection display that deforms with time with a single uncalibrated projector and uncalibrated camera, and (iv) performing an optimisation of the distortioncorrection processes to operate in real-time. The geometric distortion correction process designed in this work has been tested for both static projection systems in which the components remain fixed in position, and dynamic projection systems in which the positions of components or shape of the display change with time. The results of these tests show that the geometric distortion-correction technique developed in this work improves the observed image geometry by as much as 31% based on normalised correlation measure. The optimisation of the distortion-correction process resulted in a 98% improvement of its speed of operation thereby demonstrating the applicability of the proposed approach to real projection systems with deformable projection displays. 006.4 Computer-aided engineering ; Imaging
92	Modular neural networks applied to pattern recognition tasks Gherman, Bogdan George January 2016 (has links) Pattern recognition has become an accessible tool in developing advanced adaptive products. The need for such products is not diminishing but on the contrary, requirements for systems that are more and more aware of their environmental circumstances are constantly growing. Feed-forward neural networks are used to learn patterns in their training data without the need to discover by hand the relationships present in the data. However, the problem of estimating the required size of the neural network is still not solved. If we choose a neural network that is too small for a particular given task, the network is unable to "comprehend" the intricacies of the data. On the other hand if we choose a network size that is too big for the given task, we will observe that there are too many parameters to be tuned for the network, or we can fall in the "Curse of dimensionality" or even worse, the training algorithm can easily be trapped in local minima of the error surface. Therefore, we choose to investigate possible ways to find the 'Goldilocks' size for a feed-forward neural network (which is just right in some sense), being given a training set. Furthermore, we used a common paradigm used by the Roman Empire and employed on a wide scale in computer programming, which is the "Divide-et-Impera" approach, to divide a given dataset in multiple sub-datasets, solve the problem for each of the sub-dataset and fuse the results of all the sub-problems to form the result for the initial problem as a whole. To this effect we investigated modular neural networks and their performance. 006.4 Q Science ; T Technology
93	Biométrie faciale 3D par apprentissage des caractéristiques géométriques : application à la reconnaissance des visages et à la classification du genre / 3D facial biometric using geometric characteristics and machine learning : application to face recognition and gender classification Ballihi, Lahoucine 12 May 2012 (has links) La biométrie du visage a suscité, ces derniers temps, l’intérêt grandissant de la communauté scientifique et des industriels de la biométrie vue son caractère naturel, sans contact et non-intrusif. Néanmoins, les performances des systèmes basés sur les images 2D sont affectées par différents types de variabilités comme la pose, les conditions d’éclairage, les occultations et les expressions faciales. Avec la disponibilité de caméras 3D capables d’acquérir la forme tridimensionnelle, moins sensibles aux changements d’illumination et de pose, plusieurs travaux de recherche se sont tournés vers l’étude de cette nouvelle modalité. En revanche, d’autres défis apparaissent comme les déformations de la forme faciales causées par les expressions et le temps de calcul que requièrent les approches développées. Cette thèse s’inscrit dans ce paradigme en proposant de coupler la géométrie Riemannienne avec les techniques d’apprentissage pour une biométrie faciale 3D efficace et robuste aux changements d’expressions. Après une étape de pré-traitement, nous proposons de représenter les surfaces faciales par des collections de courbes 3D qui captent localement leurs formes. Nous utilisons un cadre géométrique existant pour obtenir les déformations « optimales » entre les courbes ainsi que les distances les séparant sur une variété Riemannienne (espace des formes des courbes). Nous appliquons, par les suites, des techniques d’apprentissage afin de déterminer les courbes les plus pertinentes pour deux applications de la biométrie du visage: la reconnaissance d’identité et la classification du genre. Les résultats obtenus sur le benchmark de référence FRGC v2 et leurs comparaison avec les travaux de l’état de l’art confirment tout l’intérêt de coupler l’analyse locale de la forme par une approche géométrique (possibilité de calculer des moyennes, etc.) avec des techniques d’apprentissage (Boosting, etc.) pour gagner en temps de calcul et en performances. / Since facial biometric recognition is contactless, non-intrusive, and somehow natural (i.e more accepted by end-users), it emerges as one attractive way to achieve identity recognition. Unfortunately, 2D-based face technologies (still image or image sequence) still face difficult challenges such as pose variations, changes in lighting conditions, occlusions, and facial expressions. Over the last ten years, face recognition using the 3D shape of the face has become a major research area due to its robustness to lighting conditions and pose variations. Most of state-of-the-art works focused on the variability caused by facial deformations and proposed methods robust to such shape variations. Achieving good performances in automatic 3D face recognition and gender classification is an important issue when developing intelligent systems. In this thesis we propose a unified framework, which is fully automatic 3D face recognition and gender classification. We propose to represent a 3D facial surface by a set of radial curves and iso-level curves. The proposed framework combines machine learning techniques (Boosting, etc.) and Riemannain geometry-based shape analysis in order to select relevant facial curves extracted from 3D facial surfaces. The feature selection step improves the performances of both our identity recognition and gender classification approaches. Besides, the set of the obtained relevant curves provides a compact signature of 3D face, which significantly reduces the computational cost and the storage requirements for face recognition and gender classification.. The main contributions of this thesis include:1) A new geometric feature selection approach for efficient 3D face recognition, which operating the most relevant characteristics to resolve the challenge of facial expressions. In particular, we are interested in selecting facial curves that are most suitable for 3D face recognition by using machine learning techniques.2) A new gender classification approach using the 3D face shape represented by collections of curves. In particular, we are interested in finding the set of facial curves that are most suitable for gender discrimination.Exhaustive experiments were conducted on the FRGCv2 database, the obtained results were compared with those of the state-of-the-art work, and the effectiveness of local geometric shape analysis of facial surfaces combined with machine learning techniques were outlined. Chemin géodésique Courbes de niveaux 006.4
94	Learning speaker-specific characteristics with deep neural architecture Salman, Ahmad January 2012 (has links) Robust Speaker Recognition (SR) has been a focus of attention for researchers since long. The advancement in speech-aided technologies especially biometrics highlights the necessity of foolproof SR systems. However, the performance of a SR system critically depends on the quality of speech features used to represent the speaker-specific information. This research aims at extracting the speaker-specific information from Mel-frequency Cepstral Coefficients (MFCCs) using deep learning. Speech is a mixture of various information components that include linguistic, speaker-specific and speaker’s emotional state information. Feature extraction for each information component is inevitable in different speech-related tasks for robust performance. However, almost all forms of speech representation carry all the information as a whole, which is responsible for the compromised performances by SR systems. Motivated by the complex problem solving ability of deep architectures by learning high-level task-specific information in the data, we propose a novel Deep Neural Architecture (DNA) to extract speaker-specific information (SI) from MFCCs, a popular frequency domain speech signal representation. A two-stage learning strategy is adopted, which is based on unsupervised training for network initialisation followed by regularised contrastive learning. To train our network in the 2nd stage, we devise a contrastive loss function to discriminate the speakers on the basis of their intrinsic statistical patterns, distributed in the representations yielded by our deep network. This is achieved in the contrastive pair-wise comparison of these representations for similar or dissimilar speakers. To improve the generalisation and reduce the interference of environmental effects with the speaker-specific representation, we regulate the contrastive loss with the data reconstruction loss in a multi-objective optimisation. A detailed study has been done to analyse the parametric space in training the proposed deep architecture for optimum performance. Finally we compare the performance of our learned speaker-specific representations with several state-of-the-art techniques in speaker verification and speaker segmentation tasks. It is evident that the representations acquired through learned DNA are invariant and comparatively less sensitive to the text, language and environmental variability. 006.4 Speaker Recognition ; Deep Learning
95	Smooth relevance vector machines Schmolck, Alexander January 2008 (has links) Regression tasks belong to the set of core problems faced in statistics and machine learning and promising approaches can often be generalized to also deal with classification, interpolation or denoising problems. Whereas the most widely used classical statistical techniques place severe a priori constraints on the type of function that can be approximated (e.g. only lines, in the case of linear regression), the successes of sparse kernel learners, such as the SVM (support vector machine) demonstrate that good results may be obtained in a quite general framework by enforcing sparsity. Similarly, even very simple sparsity-based denoising techniques, such as classical wavelet shrinkage, can produce surprisingly good results on a wide variety of different signals, because, unlike noise, most signals of practical interest share vital characteristics (such as smoothness, or the ability to be well approximated by piece-wise linear polynomials of a low order) that allow a sparse representation in wavelet space. On the other hand results obtained from SVMs (and classical wavelet-shrinkage) suffer from a certain lack of interpretability, since one cannot straightforwardly attach probabilities to them. By contrast regression, and even more importantly classification, in a Bayesian context always entails a probabilistic measure of confidence in the results, which, provided the model assumptions are reasonably accurate, forms a basis for principled decision-making. The relevance vector machine (RVM) combines these strengths by explicitly encoding the criterion of model sparsity as a (Bayesian) prior over the model weights and offers a single, unified paradigm to efficiently deal with regression as well as classification tasks. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing -- possibly even both at the same time (e.g. for the multiscale Doppler data). This thesis details an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. The resultant smooth RVM (sRVM) encompasses the original RVM as a special case, but empirical results with a variety of popular data sets show that it can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. As the smoothness prior effectively makes it possible to use (highly efficient) wavelet kernels in an RVM setting this work also unveils a strong connection between Bayesian wavelet shrinkage and RVM regression and effectively further extends the applicability of the RVM to denoising tasks for up to millions of datapoints. We further discuss its applicability to classification tasks. 006.4
96	Data-independent vs. data-dependent dimension reduction for pattern recognition in high dimensional spaces Hassan, Tahir Mohammed January 2017 (has links) There has been a rapid emergence of new pattern recognition/classification techniques in a variety of real world applications over the last few decades. In most of the pattern recognition/classification applications, the pattern of interest is modelled by a data vector/array of very high dimension. The main challenges in such applications are related to the efficiency of retrieval, analysis, and verifying/classifying the pattern/object of interest. The “Curse of Dimension” is a reference to these challenges and is commonly addressed by Dimension Reduction (DR) techniques. Several DR techniques has been developed and implemented in a variety of applications. The most common DR schemes are dependent on a dataset of “typical samples” (e.g. the Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA)). However, data-independent DR schemes (e.g. Discrete Wavelet Transform (DWT), and Random Projections (RP)) are becoming more desirable due to lack of density ratio of samples to dimension. In this thesis, we critically review both types of techniques, and highlight advantages and disadvantages in terms of efficiency and impact on recognition accuracy. We shall study the theoretical justification for the existence of DR transforms that preserve, within tolerable error, distances between would be feature vectors modelling objects of interest. We observe that data-dependent DRs do not specifically attempts to preserve distances, and the problems of overfitting and biasness are consequences of low density ratio of samples to dimension. Accordingly, the focus of our investigations is more on data-independent DR schemes and in particular on the different ways of generating RPs as an efficient DR tool. RPs suitable for pattern recognition applications are only restricted by a lower bound on the reduced dimension that depends on the tolerable error. Besides, the known RPs that are generated in accordance to some probability distributions, we investigate and test the performance of differently constructed over-complete Hadamard mxn (m<<n) submatrices, using the inductive Sylvester and Walsh-Paley methods. Our experimental work conducted for 2 case studies (Speech Emotion Recognition (SER) and Gait-based Gender Classification (GBGC)) demonstrate that these matrices perform as well, if not better, than data-dependent DR schemes. Moreover, dictionaries obtained by sampling the top rows of Walsh Paley matrices outperform matrices constructed more randomly but this may be influenced by the type of biometric and/or recognition schemes. We shall, also propose the feature-block (FB) based DR as an innovative way to overcome the problem of low density ratio applications and demonstrate its success for the SER case study. 006.4
97	3D human behavior understanding by shape analysis of human motion and pose / Compréhension de comportements humains 3D par l'analyse de forme de la posture et du mouvement Devanne, Maxime 01 December 2015 (has links) L'émergence de capteurs de profondeur capturant la structure 3D de la scène et du corps humain offre de nouvelles possibilités pour l'étude du mouvement et la compréhension des comportements humains. Cependant, la conception et le développement de modules de reconnaissance de comportements à la fois précis et efficaces est une tâche difficile en raison de la variabilité de la posture humaine, la complexité du mouvement et les interactions avec l'environnement. Dans cette thèse, nous nous concentrons d'abord sur le problème de la reconnaissance d'actions en représentant la trajectoire du corps humain au cours du temps, capturant ainsi simultanément la forme du corps et la dynamique du mouvement. Le problème de la reconnaissance d'actions est alors formulé comme le calcul de similitude entre la forme des trajectoires dans un cadre Riemannien. Les expériences menées sur quatre bases de données démontrent le potentiel de la solution en termes de précision/temps de latence de la reconnaissance d'actions. Deuxièmement, nous étendons l'étude aux comportements plus complexes en analysant l'évolution de la forme de la posture pour décomposer la séquence en unités de mouvement. Chaque unité de mouvement est alors caractérisée par la trajectoire de mouvement et l'apparence autour des mains, de manière à décrire le mouvement humain et l'interaction avec les objets. Enfin, la séquence de segments temporels est modélisée par un classifieur Bayésien naïf dynamique. Les expériences menées sur quatre bases de données évaluent le potentiel de l'approche dans différents contextes de reconnaissance et détection en ligne de comportements. / The emergence of RGB-D sensors providing the 3D structure of both the scene and the human body offers new opportunities for studying human motion and understanding human behaviors. However, the design and development of models for behavior recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, the complexity of human motion and possible interactions with the environment. In this thesis, we first focus on the action recognition problem by representing human action as the trajectory of 3D coordinates of human body joints over the time, thus capturing simultaneously the body shape and the dynamics of the motion. The action recognition problem is then formulated as the problem of computing the similarity between shape of trajectories in a Riemannian framework. Experiments carried out on four representative benchmarks demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Second, we extend the study to more complex behaviors by analyzing the evolution of the human pose shape to decompose the motion stream into short motion units. Each motion unit is then characterized by the motion trajectory and depth appearance around hand joints, so as to describe the human motion and interaction with objects. Finally, the sequence of temporal segments is modeled through a Dynamic Naive Bayesian Classifier. Experiments on four representative datasets evaluate the potential of the proposed approach in different contexts, including recognition and online detection of behaviors. Données RGB-D Reconnaissance de mouvements 006.4
98	A modelling-oriented scheme for control chart pattern recognition De La Torre Gutiérrez, Héctor January 2017 (has links) Control charts are graphical tools that monitor and assess the performance of production processes, revealing abnormal (deterministic) disturbances when there is a fault. Simple patterns belonging to one of six types can be observed when a fault is occurring, and a Normal pattern when the process is performing under its intended conditions. Machine Learning algorithms have been implemented in this research to enable automatic identification of simple patterns. Two pattern generation schemes (PGS) for synthesising patterns are proposed in this work. These PGSs ensure generality, randomness, and comparability, as well as allowing the further categorisation of the studied patterns. One of these PGSs was developed for processes that fulfil the NIID (Normally, identically and independently distributed) condition, and the other for three first-order lagged time series models. This last PGS was used as base to generate patterns of feedback-controlled processes. Using the three aforementioned processes, control chart pattern recognition (CCPR) systems for these process types were proposed and studied. Furthermore, taking the recognition accuracy as a performance measure, the arrangement of input factors that achieved the highest accuracies for each of the CCPR systems was determined. Furthermore, a CCPR system for feedback-controlled processes was developed. 006.4
99	3D dynamic facial sequences analysis for face recognition and emotion detection / Analyse de séquences faciales 3D pour la reconnaissance d’identité et la détection des émotions Alashkar, Taleb 02 November 2015 (has links) L’étude menée dans le cadre de cette thèse vise l’étude du rôle de la dynamique de formes faciales 3D à révéler l’identité des personnes et leurs états émotionnels. Pour se faire, nous avons proposé un cadre géométrique pour l’étude des formes faciales 3D et leurs dynamiques dans le temps. Une séquence 3D est d’abord divisée en courtes sous-séquences, puis chacune des sous-séquences obtenues est représentée dans une variété de Grassmann (ensemble des sous-espaces linéaires de dimension fixe). Nous avons exploité la géométrie de ces variétés pour comparer des sous-séquences 3D, calculer des statistiques (telles que des moyennes) et quantifier la divergence entre des éléments d’une même variété Grassmannienne. Nous avons aussi proposé deux représentations possibles pour les deux applications cibles – (1) la première est basée sur les dictionnaires (de sous-espaces) associée à des techniques de Dictionary Learning Sparse Coding pour la reconnaissance d’identité et (2) le représentation par des trajectoires paramétrées par le temps sur les Grassmanniennes couplée avec une variante de l’algorithme de classification SVM, permettant un apprentissage avec des données partielles, pour la détection précoce des émotions spontanée. Les expérimentations réalisées sur les bases publiques BU-4DFE, Cam3D et BP4D-Spontaneous montrent à la fois l’intérêt du cadre géométrique proposé (en terme de temps de calcul et de robustesse au bruit et aux données manquantes) et les représentations adoptées (dictionnaires pour la reconnaissance d’identité et trajectoires pour la détection précoce des émotions spontanées). / In this thesis, we have investigated the problems of identity recognition and emotion detection from facial 3D shapes animations (called 4D faces). In particular, we have studied the role of facial (shapes) dynamics in revealing the human identity and their exhibited spontaneous emotion. To this end, we have adopted a comprehensive geometric framework for the purpose of analyzing 3D faces and their dynamics across time. That is, a sequence of 3D faces is first split to an indexed collection of short-term sub-sequences that are represented as matrix (subspace) which define a special matrix manifold called, Grassmann manifold (set of k-dimensional linear subspaces). The geometry of the underlying space is used to effectively compare the 3D sub-sequences, compute statistical summaries (e.g. sample mean, etc.) and quantify densely the divergence between subspaces. Two different representations have been proposed to address the problems of face recognition and emotion detection. They are respectively (1) a dictionary (of subspaces) representation associated to Dictionary Learning and Sparse Coding techniques and (2) a time-parameterized curve (trajectory) representation on the underlying space associated with the Structured-Output SVM classifier for early emotion detection. Experimental evaluations conducted on publicly available BU-4DFE, BU4D-Spontaneous and Cam3D Kinect datasets illustrate the effectiveness of these representations and the algorithmic solutions for identity recognition and emotion detection proposed in this thesis. Reconnaissance automatique des visages Détection automatique de la douleur 006.4
100	Efficient hand orientation and pose estimation for uncalibrated cameras Asad, M. January 2017 (has links) We proposed a staged probabilistic regression method that is capable of learning well from a number of variations within a dataset. The proposed method is based on multi layered Random Forest, where the first layer consisted of a single marginalization weights regressor and second layer contained an ensemble of expert learners. The expert learners are trained in stages, where each stage involved training and adding an expert learner to the intermediate model. After every stage, the intermediate model was evaluated to reveal a latent variable space defining a subset that the model had difficulty in learning from. This subset was used to train the next expert regressor. The posterior probabilities for each training sample were extracted from each expert regressors. These posterior probabilities were then used along with a Kullback-Leibler divergence-based optimization method to estimate the marginalization weights for each regressor. A marginalization weights regressor was trained using CDF and the estimated marginalization weights. We showed the extension of our work for simultaneous hand orientation and pose inference. The proposed method outperformed the state-of-the-art for marginalization of multi-layered Random Forest and hand orientation inference. Furthermore, we show that a method which simultaneously learns from hand orientation and pose outperforms pose classification as it is able to better understand the variations in pose induced due to viewpoint changes. 006.4

Search results