81 |
Toward Understanding Human Expression in Human-Robot InteractionMiners, William Ben January 2006 (has links)
Intelligent devices are quickly becoming necessities to support our activities during both work and play. We are already bound in a symbiotic relationship with these devices. An unfortunate effect of the pervasiveness of intelligent devices is the substantial investment of our time and effort to communicate intent. Even though our increasing reliance on these intelligent devices is inevitable, the limits of conventional methods for devices to perceive human expression hinders communication efficiency. These constraints restrict the usefulness of intelligent devices to support our activities. Our communication time and effort must be minimized to leverage the benefits of intelligent devices and seamlessly integrate them into society. Minimizing the time and effort needed to communicate our intent will allow us to concentrate on tasks in which we excel, including creative thought and problem solving. <br /><br /> An intuitive method to minimize human communication effort with intelligent devices is to take advantage of our existing interpersonal communication experience. Recent advances in speech, hand gesture, and facial expression recognition provide alternate viable modes of communication that are more natural than conventional tactile interfaces. Use of natural human communication eliminates the need to adapt and invest time and effort using less intuitive techniques required for traditional keyboard and mouse based interfaces. <br /><br /> Although the state of the art in natural but isolated modes of communication achieves impressive results, significant hurdles must be conquered before communication with devices in our daily lives will feel natural and effortless. Research has shown that combining information between multiple noise-prone modalities improves accuracy. Leveraging this complementary and redundant content will improve communication robustness and relax current unimodal limitations. <br /><br /> This research presents and evaluates a novel multimodal framework to help reduce the total human effort and time required to communicate with intelligent devices. This reduction is realized by determining human intent using a knowledge-based architecture that combines and leverages conflicting information available across multiple natural communication modes and modalities. The effectiveness of this approach is demonstrated using dynamic hand gestures and simple facial expressions characterizing basic emotions. It is important to note that the framework is not restricted to these two forms of communication. The framework presented in this research provides the flexibility necessary to include additional or alternate modalities and channels of information in future research, including improving the robustness of speech understanding. <br /><br /> The primary contributions of this research include the leveraging of conflicts in a closed-loop multimodal framework, explicit use of uncertainty in knowledge representation and reasoning across multiple modalities, and a flexible approach for leveraging domain specific knowledge to help understand multimodal human expression. Experiments using a manually defined knowledge base demonstrate an improved average accuracy of individual concepts and an improved average accuracy of overall intents when leveraging conflicts as compared to an open-loop approach.
|
82 |
Toward Understanding Human Expression in Human-Robot InteractionMiners, William Ben January 2006 (has links)
Intelligent devices are quickly becoming necessities to support our activities during both work and play. We are already bound in a symbiotic relationship with these devices. An unfortunate effect of the pervasiveness of intelligent devices is the substantial investment of our time and effort to communicate intent. Even though our increasing reliance on these intelligent devices is inevitable, the limits of conventional methods for devices to perceive human expression hinders communication efficiency. These constraints restrict the usefulness of intelligent devices to support our activities. Our communication time and effort must be minimized to leverage the benefits of intelligent devices and seamlessly integrate them into society. Minimizing the time and effort needed to communicate our intent will allow us to concentrate on tasks in which we excel, including creative thought and problem solving. <br /><br /> An intuitive method to minimize human communication effort with intelligent devices is to take advantage of our existing interpersonal communication experience. Recent advances in speech, hand gesture, and facial expression recognition provide alternate viable modes of communication that are more natural than conventional tactile interfaces. Use of natural human communication eliminates the need to adapt and invest time and effort using less intuitive techniques required for traditional keyboard and mouse based interfaces. <br /><br /> Although the state of the art in natural but isolated modes of communication achieves impressive results, significant hurdles must be conquered before communication with devices in our daily lives will feel natural and effortless. Research has shown that combining information between multiple noise-prone modalities improves accuracy. Leveraging this complementary and redundant content will improve communication robustness and relax current unimodal limitations. <br /><br /> This research presents and evaluates a novel multimodal framework to help reduce the total human effort and time required to communicate with intelligent devices. This reduction is realized by determining human intent using a knowledge-based architecture that combines and leverages conflicting information available across multiple natural communication modes and modalities. The effectiveness of this approach is demonstrated using dynamic hand gestures and simple facial expressions characterizing basic emotions. It is important to note that the framework is not restricted to these two forms of communication. The framework presented in this research provides the flexibility necessary to include additional or alternate modalities and channels of information in future research, including improving the robustness of speech understanding. <br /><br /> The primary contributions of this research include the leveraging of conflicts in a closed-loop multimodal framework, explicit use of uncertainty in knowledge representation and reasoning across multiple modalities, and a flexible approach for leveraging domain specific knowledge to help understand multimodal human expression. Experiments using a manually defined knowledge base demonstrate an improved average accuracy of individual concepts and an improved average accuracy of overall intents when leveraging conflicts as compared to an open-loop approach.
|
83 |
3d Hand Tracking In Video SequencesTokatli, Aykut 01 September 2005 (has links) (PDF)
The use of hand gestures provides an attractive alternative to cumbersome interface devices such as keyboard, mouse, joystick, etc. Hand tracking has a great potential as a tool for better human-computer interaction by means of communication in a more natural and articulate way. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures and hand tracking.
In this study, a real-time hand tracking system is developed. Mainly, it is image-based hand tracking and based on 2D image information. For separation and identification of finger parts, coloured markers are used. In order to obtain 3D tracking, a stereo vision approach is used where third dimension is obtained by depth information. In order to see results in 3D, a 3D hand model is developed and Java 3D is used as the 3D environment.
Tracking is tested on two different types of camera: a cheap USB web camera and Sony FCB-IX47AP camera, connected to the Matrox Meteor frame grabber with a standard Intel Pentium based personal computer. Coding is done by Borland C++ Builder 6.0 and Intel Image Processing and Open Source Computer Vision (OpenCV) library are used as well. For both camera types, tracking is found to be robust and efficient where hand tracking at ~8 fps could be achieved.
Although the current progress is encouraging, further theoretical as well as computational advances are needed for this highly complex task of hand tracking.
|
84 |
Human Action Recognition In Video Data For Surveillance ApplicationsGurrapu, Chaitanya January 2004 (has links)
Detecting human actions using a camera has many possible applications in the security industry. When a human performs an action, his/her body goes through a signature sequence of poses. To detect these pose changes and hence the activities performed, a pattern recogniser needs to be built into the video system. Due to the temporal nature of the patterns, Hidden Markov Models (HMM), used extensively in speech recognition, were investigated. Initially a gesture recognition system was built using novel features. These features were obtained by approximating the contour of the foreground object with a polygon and extracting the polygon's vertices. A Gaussian Mixture Model (GMM) was fit to the vertices obtained from a few frames and the parameters of the GMM itself were used as features for the HMM. A more practical activity detection system using a more sophisticated foreground segmentation algorithm immune to varying lighting conditions and permanent changes to the foreground was then built. The foreground segmentation algorithm models each of the pixel values using clusters and continually uses incoming pixels to update the cluster parameters. Cast shadows were identified and removed by assuming that shadow regions were less likely to produce strong edges in the image than real objects and that this likelihood further decreases after colour segmentation. Colour segmentation itself was performed by clustering together pixel values in the feature space using a gradient ascent algorithm called mean shift. More robust features in the form of mesh features were also obtained by dividing the bounding box of the binarised object into grid elements and calculating the ratio of foreground to background pixels in each of the grid elements. These features were vector quantized to reduce their dimensionality and the resulting symbols presented as features to the HMM to achieve a recognition rate of 62% for an event involving a person writing on a white board. The recognition rate increased to 80% for the "seen" person sequences, i.e. the sequences of the person used to train the models. With a fixed lighting position, the lack of a shadow removal subsystem improved the detection rate. This is because of the consistent profile of the shadows in both the training and testing sequences due to the fixed lighting positions. Even with a lower recognition rate, the shadow removal subsystem was considered an indispensable part of a practical, generic surveillance system.
|
85 |
Fusion tardive asynchrone appliquée à la reconnaissance des gestes / Asyncronous late fusion applied to gesture recognitionSaade, Philippe 11 May 2017 (has links)
Dans cette thèse, nous nous intéressons à la reconnaissance de l'activité humaine. Nous commençons par proposer notre propre définition d'une action : une action est une séquence prédéfinie de gestes simples et concaténés. Ainsi, des actions similaires sont composées par les mêmes gestes simples. Chaque réalisation d'une action (enregistrement) est unique. Le corps humain et ses articulations vont effectuer les mêmes mouvements que celles d'un enregistrement de référence, avec des variations d'amplitude et de dynamique ne devant pas dépasser certaines limites qui conduiraient à un changement complet d'action. Pour effectuer nos expérimentations, nous avons capturé un jeu de données contenant des variations de base, puis fusionné certains enregistrements avec d'autres actions pour former un second jeu induisant plus de confusion au cours de la classification. Ensuite, nous avons capturé trois autres jeux contenant des propriétés intéressantes pour nos expérimentations avec la Fusion Tardive Asynchrone (ou Asynchronous Late Fusion notée ALF). Nous avons surmonté le problème des petits jeux non discriminants pour la reconnaissance d'actions en étendant un ensemble d'enregistrements effectués par différentes personnes et capturés par une caméra RGB-D. Nous avons présenté une nouvelle méthode pour générer des enregistrements synthétiques pouvant être utilisés pour l'apprentissage d'algorithmes de reconnaissance de l'activité humaine. La méthode de simulation a ainsi permis d'améliorer les performances des différents classifieurs. Un aperçu général de la classification des données dans un contexte audiovisuel a conduit à l'idée de l'ALF. En effet, la plupart des approches dans ce domaine classifient les flux audio et vidéo séparément, avec des outils différents. Chaque séquence temporelle est analysée séparément, comme dans l'analyse de flux audiovisuels, où la classification délivre des décisions à des instants différents. Ainsi, pour déduire la décision finale, il est important de fusionner les décisions prises séparément, d'où l'idée de la fusion asynchrone. Donc, nous avons trouvé intéressant d'appliquer l'ALF à des séquences temporelles. Nous avons introduit l'ALF afin d'améliorer la classification temporelle appliquée à des algorithmes de fusion tardive tout en justifiant l'utilisation d'un modèle asynchrone lors de la classification des données temporelles. Ensuite, nous avons présenté l'algorithme de l'ALF et les paramètres utilisés pour l'optimiser. Enfin, après avoir mesuré les performances de classifications avec différents algorithmes et jeux de données, nous avons montré que l'ALF donne de meilleurs résultats qu'une solution synchrone simple. Etant donné qu'il peut être difficile d'identifier les jeux de données compatibles avec l'ALF, nous avons construit des indicateurs permettant d'en extraire des informations statistiques. / In this thesis, we took interest in human action recognition. Thus, it was important to define an action. We proposed our own definition: an action is a predefined sequence of concatenated simple gestures. The same actions are composed of the same simple gestures. Every performance of an action (recording) is unique. Hence, the body and the joints will perform the same movements as the reference recording, with changes of dynamicity of the sequence and amplitude in the DOF. We note that the variations in the amplitude and dynamicity must not exceed certain boundaries in order not to lead to entirely different actions. For our experiments, we captured a dataset composed of actions containing basic variations. We merged some of those recordings with other actions to form a second dataset, consequently inducing more confusion than the previous one during the classification. We also captured three other datasets with properties that are interesting for our experimentations with the ALF (Asynchronous Late Fusion). We overcame the problem of non-discriminatory actions datasets for action recognition by enlarging a set of recordings performed by different persons and captured by an RGB-D camera. We presented a novel method for generating synthetic recordings, for training action recognition algorithms. We analyzed the parameters of the method and identified the most appropriate ones, for the different classifiers. The simulation method improved the performances while classifying different datasets. A general overview of data classification starting from the audio-visual context led to the ALF idea. In fact, most of the approaches in the domain classify sound and video streams separately with different tools. Every temporal sequence from a recording is analyzed distinctly, as in audiovisual stream analysis, where the classification outputs decisions at various time instants. Therefore, to infer the final decision, it is important to fuse the decisions that were taken separately, hence the idea of the asynchronous fusion. As a result, we found it interesting to implement the ALF in temporal sequences. We introduced the ALF model for improving temporal events classification applied on late fusion classification algorithms. We showed the reason behind the use of an asynchronous model when classifying datasets with temporal properties. Then, we introduced the algorithm behind the ALF and the parameters used to tune it. Finally, according to computed performances from different algorithms and datasets, we showed that the ALF improves the results of a simple Synchronous solution in most of the cases. As it can be difficult for the user of the ALF solution to determine which datasets are compatible with the ALF, we built indicators to compare the datasets by extracting statistical information from the recordings. We developed indexes: the ASI and the ASIP, combined into a final index (the ASIv) to provide information concerning the compatibility of the dataset with the ALF. We evaluated the performances of the ALF on the segmentation of action series and compared the results between synchronous and ALF solutions. The method that we proposed increased the performances. We analyzed the human movement and gave a general definition of an action. Later, we improved this definition and proposed a "visual definition" of an action. With the aid of the ALF model, we focus on the parts and joints of an action that are the most discriminant and display them in an image. In the end, we proposed multiple paths as future studies. The most important ones are : - Working on a process to find the ALF's number of parts using the ASIv. - Reducing the complexity by finding the discriminant joints and features thanks to the ALF properties - Studying the MD-DTW features in-depth since the algorithm depends on the choice of the features - Implementing a DNN for comparison purposes - Developing the confidence coefficient.
|
86 |
Avaliação das técnicas de segmentação, modelagem e classificação para o reconhecimento automático de gestos e proposta de uma solução para classificar gestos da libras em tempo realAnjo, Mauro dos Santos 22 October 2013 (has links)
Made available in DSpace on 2016-06-02T19:06:03Z (GMT). No. of bitstreams: 1
4988.pdf: 3663610 bytes, checksum: 1eb03927c23747c4a6420de5624f8571 (MD5)
Previous issue date: 2013-10-22 / Universidade Federal de Sao Carlos / Multimodal interfaces are becoming popular and trying to enhance user experience through the use of natural forms of interaction. Among these forms we have speech and gestures inputs. Speech recognition is already a common feature in our daily basis but gesture recognition has just now being widely used as a new form of interaction. The Brazilian Sign Language (Libras) was recently recognized as a legal way of communication since the Brazilian Government enacted the law N˚10.436 on 04/24/2002, and also has recently became an obligatory subject in teachers education and an elective subject in undergraduate courses through the enactment N˚5.626 on 12/22/2005. In this context, this dissertation presents a study of all the steps that are necessary to achieve a complete system to recognize Static and Dynamic gestures of Libras, being these steps: Segmentation; Modeling and Interpretation; and Classification. Results and proposed solutions will be presented for each one of these steps, and the system will be evaluated in the task of real-time recognition of static and dyamic gestures within a finite set of Libras gestures. All the solutions presented in this dissertation were embedded in the software GestureUI, in which the main goal is to simplify the research in the field of gesture recognition allowing the communication with multimodal interfaces through a TCP/IP protocol. / Interfaces multimodais estão cada vez mais populares e buscam a interação natural como recurso para enriquecer a experiência do usuário. Dentre as formas de interação natural, estão a fala e os gestos. O reconhecimento de fala já está presente em nosso dia a dia em variadas aplicações, porém o reconhecimento de gestos apareceu recentemente como uma nova forma de interação. A Linguagem Brasileira de Sinais (Libras) foi recentemente reconhecida como meio de comunicação e expressão através da Lei N˚10.436 de 24/04/2002, e também foi incluída como disciplina obrigatória em cursos de formação de professores e optativa em cursos de graduação através do Decreto N˚5.626 de 22/12/2005. Neste contexto, esta dissertação apresenta um estudo sobre todas as etapas necessárias para a construção de um sistema para reconhecimento de Gestos Estáticos e Dinâmicos da Libras, sendo estas: Segmentação; Modelagem e Identificação; e Reconhecimento. Resultados e soluções propostas serão apresentados para cada uma destas etapas, e o sistema será avaliado no reconhecimento em tempo real utilizando um conjunto finito de gestos estáticos e dinâmicos. Todas as soluções apresentadas nesta dissertação foram encapsuladas no Software GestureUI, que tem por objetivo simplificar as pesquisas na área de reconhecimento de gestos permitindo a comunicação com interfaces multimodais através de um protocolo TCP/IP.
|
87 |
Sistema de Rastreamento da MÃo Humana Utilizando VisÃo Artificial para AplicaÃÃes Embarcadas / Human Hand Tracking System Using Computer Vision for Embedded ApplicationsRodrigo Fernandes Freitas 25 February 2011 (has links)
CoordenaÃÃo de AperfeiÃoamento de NÃvel Superior / Nos Ãltimos anos a capacidade de processamento dos dispositivos portÃteis tÃm aumentado muito, permitindo-lhes processar aplicaÃÃes, tais como jogos, antes somente possÃveis em plataformas de maior poder computacional. PorÃm, a interface com o usuÃrio nÃo tem acompanhado essa evoluÃÃo do poder computacional, sendo realizada ainda por meio de teclados nÃo ergonÃmicos. Esta dissertaÃÃo propÃe um sistema de interaÃÃo para dispositivos portÃteis baseado em VisÃo Computacional. Este sistema rastreia a mÃo do usuÃrio e reconhece seis possÃveis gestos: apontamento, zoom in, zoom out, rotaÃÃo horÃria, rotaÃÃo anti-horÃria e arrastar. Inicialmente o sistema captura imagens da mÃo do usuÃrio, aplica filtros de prÃ-processamento sobre estas e segmenta a regiÃo da pele atravÃs de limiarizaÃÃo. Feito isto, o contorno da mÃo à extraÃdo e representado em um vetor pelo algoritmo do cÃdigo em cadeia. As pontas dos dedos sÃo localizadas a partir do contorno representado e, atravÃs de um conjunto de regras, o gesto realizado pelo usuÃrio à reconhecido. O sistema proposto à simulado utilizando a plataforma Simulink e implementado em linguagem C ANSI. AlÃm disto, este sistema à comparado com trÃs outros sistemas descritos na literatura com base em quatro critÃrios de avaliaÃÃo: custo computacional, invariÃncia à rotaÃÃo para o gesto de apontamento, robustez à presenÃa de regiÃes no fundo da imagem com cor prÃxima à da pele e robustez à oclusÃo com regiÃes de cor prÃxima à da pele. Os resultados indicam que este sistema atende os requisitos dos critÃrios de avaliaÃÃo, portanto, sendo possÃvel sua utilizaÃÃo em dispositivos portÃteis.
|
88 |
Uma linguagem de domínio específico para descrição e reconhecimento de gestos usando sensores de profundidadeVIANA, Daniel Leite 10 August 2015 (has links)
Submitted by Haroudo Xavier Filho (haroudo.xavierfo@ufpe.br) on 2016-03-11T14:13:00Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertacao DANIEL LEITE VIANA.pdf: 4297126 bytes, checksum: fa862ba18fe815a710afe7b3e591cee8 (MD5) / Made available in DSpace on 2016-03-11T14:13:00Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertacao DANIEL LEITE VIANA.pdf: 4297126 bytes, checksum: fa862ba18fe815a710afe7b3e591cee8 (MD5)
Previous issue date: 2015-08-10 / Sistemas baseados em gestos vêm se tornando uma alternativa para o desenvolvimento
de aplicações mais intuitivas para os usuários, pois permitem a esses usuários interagirem de forma mais natural. Tais sistemas, em geral, requerem dispositivos de captura junto com alguma técnica de reconhecimento para que os gestos requeridos na interação natural sejam reconhecidos. A ausência de abstrações apropriadas para representação dos gestos dificulta as especificações de novas interações naturais. A representação de um gesto, quase sempre, envolve Aprendizagem de Máquina ou um avançado algoritmo de reconhecimento baseado nos dados da posição tridimensional do corpo humano fornecidos por sensores de profundidade, tal como o Microsoft Kinect. Além disso, as aplicações desenvolvidas tornam-se dependentes das bibliotecas de desenvolvimento dos dispositivos. Dessa forma, se o dispositivo for substituído por outro mais moderno ou de fabricante diferente quase todo o algoritmo de reconhecimento precisa ser reescrito. O principal objetivo desta dissertação é a especificação e implementação da Linguagem para Especificação de Gestos (LEG), uma Domain-Specific Language (DSL) para a especificação e reconhecimento de gestos livres do corpo humano com suporte a diferentes dispositivos de profundidade. A LEG é uma linguagem declarativa, baseada na análise das interfaces gestuais para computador e no estudo das abstrações e representações do movimento humano, a fim de reduzir a complexidade no desenvolvimento de aplicações baseadas em gestos. A implementação da linguagem foi realizada em duas etapas. Primeiro, foi criado
um framework (Kinect Gesture) com a lógica para rastrear e identificar gestos descritos na linguagem. Na segunda etapa, foi definida a gramática e o interpretador foi construído. A abordagem adotada foi de DSL externa, sendo sua sintaxe textual e particular. A fim de avaliar a implementação proposta, 15 (quinze) gestos foram especificados em LEG e reconhecidos. Tendo como referência os resultados obtidos, chegou-se a conclusão que a linguagem apresentada neste trabalho diminuiu consideravelmente a complexidade necessária para realizar a especificação e o
reconhecimento dos gestos. / Systems Based-gestures are becoming an alternative to the development of more intuitive applications for users, because enable users to interact more naturally. Generally these systems need of capture devices together with some technique for gesture recognition. The lack of appropriate abstractions for the representation of gestures difficult to specifications of new natural interactions. For specify gesture, it is almost always necessary to acquire advanced knowledge in gesture recognition area and skills on chosen device and it is for this reason that the development of gestures is restricted. Often developers are using Machine Learning as support to creating database. Another approach is to create a recognition algorithm based on data from the depth sensor Kinect. Furthermore, due to the nature of the software development kits (SDK) provided by the hardware vendors to build gesture-based applications, the developed
applications often become tightly coupled with the SDK. The result is that significant portions of the application need to be rewritten to run it on another device.
The main goal of this dissertation is to implement and evaluate GSL (Gesture Specific
Language), a Domain-Specific Language for specification and identification of gestures with support to different depth sensors. GSL is a declarative programming language based on the analysis of gestural interfaces for computer and study of abstractions and representations of human movement, in order to reduce the complexity in application development based on gestures. The development was conducted in two phases: the first was implemented a framework (Kinect Gesture) with logic for tracking and identify gestures. In the second phase, we built an grammar and a compiler. We adopted a external DSL approach, with specific and textual syntax. In order to evaluate the proposed implementation, we used GSL for specification and recognition of fifteen gestures. The results obtained show that GSL reduced considerably the complexity on
perform the specification and the recognition of gestures.
|
89 |
Reconnaissance de gestes et actions pour la collaboration homme-robot sur chaîne de montage / Recognition of gestures and actions for man and robot collaboration on assembly lineCoupeté, Eva 10 November 2016 (has links)
Les robots collaboratifs sont de plus en plus présents dans nos vies quotidiennes. En milieu industriel, ils sont une solution privilégiée pour rendre les chaînes de montage plus flexibles, rentables et diminuer la pénibilité du travail des opérateurs. Pour permettre une collaboration fluide et efficace, les robots doivent être capables de comprendre leur environnement, en particulier les actions humaines.Dans cette optique, nous avons décidé d’étudier la reconnaissance de gestes techniques afin que le robot puisse se synchroniser avec l’opérateur, adapter son allure et comprendre si quelque chose d’inattendu survient.Pour cela, nous avons considéré deux cas d’étude, un cas de co-présence et un cas de collaboration, tous les deux inspirés de cas existant sur les chaînes de montage automobiles.Dans un premier temps, pour le cas de co-présence, nous avons étudié la faisabilité de la reconnaissance des gestes en utilisant des capteurs inertiels. Nos très bons résultats (96% de reconnaissances correctes de gestes isolés avec un opérateur) nous ont encouragés à poursuivre dans cette voie.Sur le cas de collaboration, nous avons privilégié l’utilisation de capteurs non-intrusifs pour minimiser la gêne des opérateurs, en l’occurrence une caméra de profondeur positionnée avec une vue de dessus pour limiter les possibles occultations.Nous proposons un algorithme de suivi des mains en calculant les distances géodésiques entre les points du haut du corps et le haut de la tête. Nous concevons également et évaluons un système de reconnaissance de gestes basé sur des Chaînes de Markov Cachées (HMM) discrètes et prenant en entrée les positions des mains. Nous présentons de plus une méthode pour adapter notre système de reconnaissance à un nouvel opérateur et nous utilisons des capteurs inertiels sur les outils pour affiner nos résultats. Nous obtenons le très bon résultat de 90% de reconnaissances correctes en temps réel pour 13 opérateurs.Finalement, nous formalisons et détaillons une méthodologie complète pour réaliser une reconnaissance de gestes techniques sur les chaînes de montage. / Collaborative robots are becoming more and more present in our everyday life. In particular, within the industrial environment, they emerge as one of the preferred solution to make assembly line in factories more flexible, cost-effective and to reduce the hardship of the operators’ work. However, to enable a smooth and efficient collaboration, robots should be able to understand their environment and in particular the actions of the humans around them.With this aim in mind, we decided to study technical gestures recognition. Specifically, we want the robot to be able to synchronize, adapt its speed and understand if something unexpected arises.We considered two use-cases, one dealing with copresence, the other with collaboration. They are both inspired by existing task on automotive assembly lines.First, for the co-presence use case, we evaluated the feasibility of technical gestures recognition using inertial sensors. We obtained a very good result (96% of correct recognition with one operator) which encouraged us to follow this idea.On the collaborative use-case, we decided to focus on non-intrusive sensors to minimize the disturbance for the operators and we chose to use a depth-camera. We filmed the operators with a top view to prevent most of the potential occultations.We introduce an algorithm that tracks the operator’s hands by calculating the geodesic distances between the points of the upper body and the top of the head.We also design and evaluate an approach based on discrete Hidden Markov Models (HMM) taking the hand positions as an input to recognize technical gestures. We propose a method to adapt our system to new operators and we embedded inertial sensors on tools to refine our results. We obtain the very good result of 90% of correct recognition in real time for 13 operators.Finally, we formalize and detail a complete methodology to realize technical gestures recognition on assembly lines.
|
90 |
The optimization of gesture recognition techniques for resource-constrained devicesNiezen, Gerrit 26 January 2009 (has links)
Gesture recognition is becoming increasingly popular as an input mechanism for human-computer interfaces. The availability of MEMS (Micro-Electromechanical System) 3-axis linear accelerometers allows for the design of an inexpensive mobile gesture recognition system. Wearable inertial sensors are a low-cost, low-power solution to recognize gestures and, more generally, track the movements of a person. Gesture recognition algorithms have traditionally only been implemented in cases where ample system resources are available, i.e. on desktop computers with fast processors and large amounts of memory. In the cases where a gesture recognition algorithm has been implemented on a resource-constrained device, only the simplest algorithms were implemented to recognize only a small set of gestures. Current gesture recognition technology can be improved by making algorithms faster, more robust, and more accurate. The most dramatic results in optimization are obtained by completely changing an algorithm to decrease the number of computations. Algorithms can also be optimized by profiling or timing the different sections of the algorithm to identify problem areas. Gestures have two aspects of signal characteristics that make them difficult to recognize: segmentation ambiguity and spatio-temporal variability. Segmentation ambiguity refers to not knowing the gesture boundaries, and therefore reference patterns have to be matched with all possible segments of input signals. Spatio-temporal variability refers to the fact that each repetition of the same gesture varies dynamically in shape and duration, even for the same gesturer. The objective of this study was to evaluate the various gesture recognition algorithms currently in use, after which the most suitable algorithm was optimized in order to implement it on a mobile device. Gesture recognition techniques studied include hidden Markov models, artificial neural networks and dynamic time warping. A dataset for evaluating the gesture recognition algorithms was gathered using a mobile device’s embedded accelerometer. The algorithms were evaluated based on computational efficiency, recognition accuracy and storage efficiency. The optimized algorithm was implemented in a user application on the mobile device to test the empirical validity of the study. / Dissertation (MEng)--University of Pretoria, 2009. / Electrical, Electronic and Computer Engineering / unrestricted
|
Page generated in 0.0394 seconds