Global ETD Search

1	A contribution to mouth structure segmentation in images towards automatic mouth gesture recognition Gómez-Mendoza, Juan Bernardo 15 May 2012 (has links) (PDF) This document presents a series of elements for approaching the task of segmenting mouth structures in facial images, particularly focused in frames from video sequences. Each stage is treated separately in different Chapters, starting from image pre-processing and going up to segmentation labeling post-processing, discussing the technique selection and development in every case. The methodological approach suggests the use of a color based pixel classification strategy as the basis of the mouth structure segmentation scheme, complemented by a smart pre-processing and a later label refinement. The main contribution of this work, along with the segmentation methodology itself, is based in the development of a color-independent label refinement technique. The technique, which is similar to a linear low pass filter in the segmentation labeling space followed by a nonlinear selection operation, improves the image labeling iteratively by filling small gaps and eliminating spurious regions resulting from a prior pixel classification stage. Results presented in this document suggest that the refiner is complementary to image pre-processing, hence achieving a cumulative effect in segmentation quality. At the end, the segmentation methodology comprised by input color transformation, preprocessing, pixel classification and label refinement, is put to test in the case of mouth gesture detection in images aimed to command three degrees of freedom of an endoscope holder. [SPI:OTHER] Engineering Sciences/Other Medical Imaging Endoscopy Movement recognition Image segmentation Lips segmentation Gesture classification Human machine modelling Human face modeling
2	Fusion tardive asynchrone appliquée à la reconnaissance des gestes / Asyncronous late fusion applied to gesture recognition Saade, Philippe 11 May 2017 (has links) Dans cette thèse, nous nous intéressons à la reconnaissance de l'activité humaine. Nous commençons par proposer notre propre définition d'une action : une action est une séquence prédéfinie de gestes simples et concaténés. Ainsi, des actions similaires sont composées par les mêmes gestes simples. Chaque réalisation d'une action (enregistrement) est unique. Le corps humain et ses articulations vont effectuer les mêmes mouvements que celles d'un enregistrement de référence, avec des variations d'amplitude et de dynamique ne devant pas dépasser certaines limites qui conduiraient à un changement complet d'action. Pour effectuer nos expérimentations, nous avons capturé un jeu de données contenant des variations de base, puis fusionné certains enregistrements avec d'autres actions pour former un second jeu induisant plus de confusion au cours de la classification. Ensuite, nous avons capturé trois autres jeux contenant des propriétés intéressantes pour nos expérimentations avec la Fusion Tardive Asynchrone (ou Asynchronous Late Fusion notée ALF). Nous avons surmonté le problème des petits jeux non discriminants pour la reconnaissance d'actions en étendant un ensemble d'enregistrements effectués par différentes personnes et capturés par une caméra RGB-D. Nous avons présenté une nouvelle méthode pour générer des enregistrements synthétiques pouvant être utilisés pour l'apprentissage d'algorithmes de reconnaissance de l'activité humaine. La méthode de simulation a ainsi permis d'améliorer les performances des différents classifieurs. Un aperçu général de la classification des données dans un contexte audiovisuel a conduit à l'idée de l'ALF. En effet, la plupart des approches dans ce domaine classifient les flux audio et vidéo séparément, avec des outils différents. Chaque séquence temporelle est analysée séparément, comme dans l'analyse de flux audiovisuels, où la classification délivre des décisions à des instants différents. Ainsi, pour déduire la décision finale, il est important de fusionner les décisions prises séparément, d'où l'idée de la fusion asynchrone. Donc, nous avons trouvé intéressant d'appliquer l'ALF à des séquences temporelles. Nous avons introduit l'ALF afin d'améliorer la classification temporelle appliquée à des algorithmes de fusion tardive tout en justifiant l'utilisation d'un modèle asynchrone lors de la classification des données temporelles. Ensuite, nous avons présenté l'algorithme de l'ALF et les paramètres utilisés pour l'optimiser. Enfin, après avoir mesuré les performances de classifications avec différents algorithmes et jeux de données, nous avons montré que l'ALF donne de meilleurs résultats qu'une solution synchrone simple. Etant donné qu'il peut être difficile d'identifier les jeux de données compatibles avec l'ALF, nous avons construit des indicateurs permettant d'en extraire des informations statistiques. / In this thesis, we took interest in human action recognition. Thus, it was important to define an action. We proposed our own definition: an action is a predefined sequence of concatenated simple gestures. The same actions are composed of the same simple gestures. Every performance of an action (recording) is unique. Hence, the body and the joints will perform the same movements as the reference recording, with changes of dynamicity of the sequence and amplitude in the DOF. We note that the variations in the amplitude and dynamicity must not exceed certain boundaries in order not to lead to entirely different actions. For our experiments, we captured a dataset composed of actions containing basic variations. We merged some of those recordings with other actions to form a second dataset, consequently inducing more confusion than the previous one during the classification. We also captured three other datasets with properties that are interesting for our experimentations with the ALF (Asynchronous Late Fusion). We overcame the problem of non-discriminatory actions datasets for action recognition by enlarging a set of recordings performed by different persons and captured by an RGB-D camera. We presented a novel method for generating synthetic recordings, for training action recognition algorithms. We analyzed the parameters of the method and identified the most appropriate ones, for the different classifiers. The simulation method improved the performances while classifying different datasets. A general overview of data classification starting from the audio-visual context led to the ALF idea. In fact, most of the approaches in the domain classify sound and video streams separately with different tools. Every temporal sequence from a recording is analyzed distinctly, as in audiovisual stream analysis, where the classification outputs decisions at various time instants. Therefore, to infer the final decision, it is important to fuse the decisions that were taken separately, hence the idea of the asynchronous fusion. As a result, we found it interesting to implement the ALF in temporal sequences. We introduced the ALF model for improving temporal events classification applied on late fusion classification algorithms. We showed the reason behind the use of an asynchronous model when classifying datasets with temporal properties. Then, we introduced the algorithm behind the ALF and the parameters used to tune it. Finally, according to computed performances from different algorithms and datasets, we showed that the ALF improves the results of a simple Synchronous solution in most of the cases. As it can be difficult for the user of the ALF solution to determine which datasets are compatible with the ALF, we built indicators to compare the datasets by extracting statistical information from the recordings. We developed indexes: the ASI and the ASIP, combined into a final index (the ASIv) to provide information concerning the compatibility of the dataset with the ALF. We evaluated the performances of the ALF on the segmentation of action series and compared the results between synchronous and ALF solutions. The method that we proposed increased the performances. We analyzed the human movement and gave a general definition of an action. Later, we improved this definition and proposed a "visual definition" of an action. With the aid of the ALF model, we focus on the parts and joints of an action that are the most discriminant and display them in an image. In the end, we proposed multiple paths as future studies. The most important ones are : - Working on a process to find the ALF's number of parts using the ASIv. - Reducing the complexity by finding the discriminant joints and features thanks to the ALF properties - Studying the MD-DTW features in-depth since the algorithm depends on the choice of the features - Implementing a DNN for comparison purposes - Developing the confidence coefficient. Fusion tardive Reconnaissance de gestes Classification de gestes Analyse temporelle Simulation des gestes Late Fusion Gesture Recognition 3. Gesture Classification Temporal Analysis Gesture Simulation
3	A contribution to mouth structure segmentation in images towards automatic mouth gesture recognition / Une contribution à la segmentation structurale d’une image de la bouche par reconnaissance gestuelle automatique Gómez-Mendoza, Juan Bernardo 15 May 2012 (has links) Ce travail présente une nouvelle méthodologie pour la reconnaissance automatique des gestes de la bouche visant à l'élaboration d'IHM pour la commande d'endoscope. Cette méthodologie comprend des étapes communes à la plupart des systèmes de vision artificielle, comme le traitement d'image et la segmentation, ainsi qu'une méthode pour l'amélioration progressive de l'étiquetage obtenu grâce à la segmentation. Contrairement aux autres approches, la méthodologie est conçue pour fonctionner avec poses statiques, qui ne comprennent pas les mouvements de la tête. Beaucoup d'interêt est porté aux tâches de segmentation d'images, car cela s'est avéré être l'étape la plus importante dans la reconnaissance des gestes. En bref, les principales contributions de cette recherche sont les suivantes: La conception et la mise en oeuvre d'un algorithme de rafinement d'étiquettes qui dépend d'une première segmentation/pixel étiquetage et de deux paramétres corrélés. Le rafineur améliore la précision de la segmentation indiquée dans l'étiquetage de sortie pour les images de la bouche, il apporte également une amélioration acceptable lors de l'utilisation d'images naturelles. La définition de deux méthodes de segmentation pour les structures de la bouche dans les images; l'une fondée sur les propriétés de couleur des pixels, et l'autre sur des éléments de la texture locale, celles-ci se complétent pour obtenir une segmentation rapide et précise de la structure initiale. La palette de couleurs s'avére particuliérement importante dans la structure de séparation, tandis que la texture est excellente pour la séparation des couleurs de la bouche par rapport au fond. La dérivation d'une procédure basée sur la texture pour l'automatisation de la sélection des paramètres pour la technique de rafinement de segmentation discutée dans la première contribution. Une version améliorée de l'algorithme d'approximation bouche contour présentée dans l'ouvrage de Eveno et al. [1, 2], ce qui réduit le nombre d'itérations nécessaires pour la convergence et l'erreur d'approximation finale. La découverte de l'utilité de la composante de couleur CIE à statistiquement normalisée, dans la différenciation lévres et la langue de la peau, permettant l'utilisation des valeurs seuils constantes pour effectuer la comparaison. / This document presents a series of elements for approaching the task of segmenting mouth structures in facial images, particularly focused in frames from video sequences. Each stage is treated separately in different Chapters, starting from image pre-processing and going up to segmentation labeling post-processing, discussing the technique selection and development in every case. The methodological approach suggests the use of a color based pixel classification strategy as the basis of the mouth structure segmentation scheme, complemented by a smart pre-processing and a later label refinement. The main contribution of this work, along with the segmentation methodology itself, is based in the development of a color-independent label refinement technique. The technique, which is similar to a linear low pass filter in the segmentation labeling space followed by a nonlinear selection operation, improves the image labeling iteratively by filling small gaps and eliminating spurious regions resulting from a prior pixel classification stage. Results presented in this document suggest that the refiner is complementary to image pre-processing, hence achieving a cumulative effect in segmentation quality. At the end, the segmentation methodology comprised by input color transformation, preprocessing, pixel classification and label refinement, is put to test in the case of mouth gesture detection in images aimed to command three degrees of freedom of an endoscope holder. Imagerie médicale Endoscopie Reconnaissance de mouvement Vision artificielle Segmentation d'images Interface Homme Machine Mouvement de la bouche Mouvement des lèvres Medical Imaging Endoscopy Movement recognition Image segmentation Lips segmentation Gesture classification Human machine modelling Human face modeling 616.075 450 72
4	Development of a new technique for objective assessment of gestures in mini-invasive surgery / Développement d'une nouvelle technique pour l'évaluation objective des gestes en chirurgie mini-invasive Cifuentes Quintero, Jenny Alexandra 03 July 2015 (has links) L'une des tâches les plus difficiles de l'enseignement en chirurgie, consiste à expliquer aux étudiants quelles sont les amplitudes des forces et des couples à appliquer pour guider les instruments au cours d'une opération. Ce problème devient plus important dans le domaine de la chirurgie mini-invasive (MIS) où la perception de profondeur est perdue et le champ visuel est réduit. Pour cette raison, l'évaluation de l'habileté chirurgicale associée est devenue un point capital dans le processus d'apprentissage en médecine. Des problèmes évidents de subjectivité apparaissent dans la formation des médecins, selon l'instructeur. De nombreuses études et rapports de recherches concernent le développement de techniques automatisées d'évaluation du geste. La première partie du travail présenté dans cette thèse introduit une nouvelle méthode de classification de gestes médicaux 3D reposant sur des modèles cinématiques et biomécaniques. Celle-ci analyse de manière qualitative mais aussi quantitative les mouvements associés aux tâches effectuées. La classification du geste est réalisée en utilisant un paramétrage reposant sur la longueur d'arc pour calculer la courbure pour chaque trajectoire. Les avantages de cette approche sont l'indépendance du temps, un système de repérage absolu et la réduction du nombre de données. L'étude inclue l'analyse expérimentale de plusieurs gestes, obtenus avec plusieurs types de capteurs et réalisés par différents sujets. La deuxième partie de ce travail se concentre sur la classification reposant sur les données cinématiques et dynamiques. En premier lieu, une expression empirique, entre la géométrie du mouvement et les données cinématiques, sert à calculer une nouvelle variable appelée vitesse affine. Les expériences conduites dans ce travail de thèse montrent la nature constante de cette grandeur lorsque les gestes médicaux sont simples et identiques. Une dernière technique de classification a été implémentée en utilisant un calcul de l'énergie utilisée au cours de chaque segment du geste. Cette méthode a été validée expérimentalement en utilisant six caméras et un laparoscope instrumenté. La position 3-D de l'extrémité de l'effecteur a été enregistrée, pour plusieurs participants, en utilisant le logiciel OptiTrack Motive et des marqueurs réfléchissants montés sur le laparoscope. Les mesures de force et de couple, d'autre part, ont été acquises à l'aide des capteurs fixés sur l'outil et situés entre la pointe et la poignée de l'outil afin de capturer l'interaction entre le participant et le matériau manipulé. Les résultats expérimentaux présentent une bonne corrélation entre les valeurs de l'énergie et les compétences chirurgicales des participants impliqués dans ces expériences. / One of the most difficult tasks in surgical education is to teach students what is the optimal magnitude of forces and torques to guide the instrument during operation. This problem becomes even more relevant in the field of Mini Invasive Surgery (MIS), where the depth perception is lost and visual field is reduced. In this way, the evaluation of surgical skills involved in this field becomes in a critical point in the learning process. Nowadays, this assessment is performed by expert surgeons observation in different operating rooms, making evident subjectivity issues in the results depending on the trainer in charge of the task. Research works around the world have focused on the development of the automated evaluation techniques, that provide an objective feedback during the learning process. Therefore, first part of this thesis describe a new method of classification of 3D medical gestures based on biomechanical models (kinematics). This new approach analyses medical gestures based on the smoothness and quality of movements related to the tasks performed during the medical training. Thus, gesture classification is accomplished using an arc length parametrization to compute the curvature for each trajectory. The advantages of this approach are mainly oriented towards time and location independence and problem simplification. The study included several gestures that were performed repeatedly by different subjects; these data sets were acquired, also, with three different devices. Second part of this work is focused in a classification technique based on kinematic and dynamic data. In first place, an empirical expression between movement geometry and kinematic data is used to compute a different variable called the affine velocity. Experiments carried out in this work show the constant nature of this feature in basic medical gestures. In the same way, results proved an adequate classification based on this computation. Parameters found in previous experiments were taken into account to study movements more complex. Likewise, affine velocity was used to perform a segmentation of pick and release tasks, and the classification stage was completed using an energy computation, based on dynamic data, for each segment. Final experiments were performed using six video cameras and an instrumented laparoscope. The 3-D position of the end effector was recorded, for each participant, using the OptiTrack Motive Software and reflective markers mounted on the laparoscope. Force and torque measurements, on the other hand, were acquired using force and torque sensors attached to the instrument and located between the tool tip and the handle of the tool in order to capture the interaction between participant and the manipulated material. Results associated to these experiments present a correlation between the energy values and the surgical skills of the participants involved in these experiments. Courbure Classification des gestes Energie Loi de puissance Segmentation du mouvement Vitesse affine Cinématique Biomécanique Capteur Analyse expérimental Curvature Gesture classification Energy Power law Movement Segmentation Refine speed Cinematics Biomechanics Sensor Experimental analysis 629.890 72

1

Page generated in 0.1055 seconds