Spelling suggestions: "subject:"rgbd"" "subject:"rgbnd""
1 |
3D human behavior understanding by shape analysis of human motion and pose / Compréhension de comportements humains 3D par l'analyse de forme de la posture et du mouvementDevanne, Maxime 01 December 2015 (has links)
L'émergence de capteurs de profondeur capturant la structure 3D de la scène et du corps humain offre de nouvelles possibilités pour l'étude du mouvement et la compréhension des comportements humains. Cependant, la conception et le développement de modules de reconnaissance de comportements à la fois précis et efficaces est une tâche difficile en raison de la variabilité de la posture humaine, la complexité du mouvement et les interactions avec l'environnement. Dans cette thèse, nous nous concentrons d'abord sur le problème de la reconnaissance d'actions en représentant la trajectoire du corps humain au cours du temps, capturant ainsi simultanément la forme du corps et la dynamique du mouvement. Le problème de la reconnaissance d'actions est alors formulé comme le calcul de similitude entre la forme des trajectoires dans un cadre Riemannien. Les expériences menées sur quatre bases de données démontrent le potentiel de la solution en termes de précision/temps de latence de la reconnaissance d'actions. Deuxièmement, nous étendons l'étude aux comportements plus complexes en analysant l'évolution de la forme de la posture pour décomposer la séquence en unités de mouvement. Chaque unité de mouvement est alors caractérisée par la trajectoire de mouvement et l'apparence autour des mains, de manière à décrire le mouvement humain et l'interaction avec les objets. Enfin, la séquence de segments temporels est modélisée par un classifieur Bayésien naïf dynamique. Les expériences menées sur quatre bases de données évaluent le potentiel de l'approche dans différents contextes de reconnaissance et détection en ligne de comportements. / The emergence of RGB-D sensors providing the 3D structure of both the scene and the human body offers new opportunities for studying human motion and understanding human behaviors. However, the design and development of models for behavior recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, the complexity of human motion and possible interactions with the environment. In this thesis, we first focus on the action recognition problem by representing human action as the trajectory of 3D coordinates of human body joints over the time, thus capturing simultaneously the body shape and the dynamics of the motion. The action recognition problem is then formulated as the problem of computing the similarity between shape of trajectories in a Riemannian framework. Experiments carried out on four representative benchmarks demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Second, we extend the study to more complex behaviors by analyzing the evolution of the human pose shape to decompose the motion stream into short motion units. Each motion unit is then characterized by the motion trajectory and depth appearance around hand joints, so as to describe the human motion and interaction with objects. Finally, the sequence of temporal segments is modeled through a Dynamic Naive Bayesian Classifier. Experiments on four representative datasets evaluate the potential of the proposed approach in different contexts, including recognition and online detection of behaviors.
|
2 |
Estudio y mejora de métodos de registro 3D: aceleración sobre unidades de procesamiento gráfico y caracterización del espacio de transformaciones inicialesMontoyo-Bojo, Javier 13 November 2015 (has links)
Durante los últimos años ha sido creciente el uso de las unidades de procesamiento gráfico, más conocidas como GPU (Graphic Processing Unit), en aplicaciones de propósito general, dejando a un lado el objetivo para el que fueron creadas y que no era otro que el renderizado de gráficos por computador. Este crecimiento se debe en parte a la evolución que han experimentado estos dispositivos durante este tiempo y que les ha dotado de gran potencia de cálculo, consiguiendo que su uso se extienda desde ordenadores personales a grandes cluster. Este hecho unido a la proliferación de sensores RGB-D de bajo coste ha hecho que crezca el número de aplicaciones de visión que hacen uso de esta tecnología para la resolución de problemas, así como también para el desarrollo de nuevas aplicaciones. Todas estas mejoras no solamente se han realizado en la parte hardware, es decir en los dispositivos, sino también en la parte software con la aparición de nuevas herramientas de desarrollo que facilitan la programación de estos dispositivos GPU. Este nuevo paradigma se acuñó como Computación de Propósito General sobre Unidades de Proceso Gráfico (General-Purpose computation on Graphics Processing Units, GPGPU). Los dispositivos GPU se clasifican en diferentes familias, en función de las distintas características hardware que poseen. Cada nueva familia que aparece incorpora nuevas mejoras tecnológicas que le permite conseguir mejor rendimiento que las anteriores. No obstante, para sacar un rendimiento óptimo a un dispositivo GPU es necesario configurarlo correctamente antes de usarlo. Esta configuración viene determinada por los valores asignados a una serie de parámetros del dispositivo. Por tanto, muchas de las implementaciones que hoy en día hacen uso de los dispositivos GPU para el registro denso de nubes de puntos 3D, podrían ver mejorado su rendimiento con una configuración óptima de dichos parámetros, en función del dispositivo utilizado. Es por ello que, ante la falta de un estudio detallado del grado de afectación de los parámetros GPU sobre el rendimiento final de una implementación, se consideró muy conveniente la realización de este estudio. Este estudio no sólo se realizó con distintas configuraciones de parámetros GPU, sino también con diferentes arquitecturas de dispositivos GPU. El objetivo de este estudio es proporcionar una herramienta de decisión que ayude a los desarrolladores a la hora implementar aplicaciones para dispositivos GPU. Uno de los campos de investigación en los que más prolifera el uso de estas tecnologías es el campo de la robótica ya que tradicionalmente en robótica, sobre todo en la robótica móvil, se utilizaban combinaciones de sensores de distinta naturaleza con un alto coste económico, como el láser, el sónar o el sensor de contacto, para obtener datos del entorno. Más tarde, estos datos eran utilizados en aplicaciones de visión por computador con un coste computacional muy alto. Todo este coste, tanto el económico de los sensores utilizados como el coste computacional, se ha visto reducido notablemente gracias a estas nuevas tecnologías. Dentro de las aplicaciones de visión por computador más utilizadas está el registro de nubes de puntos. Este proceso es, en general, la transformación de diferentes nubes de puntos a un sistema de coordenadas conocido. Los datos pueden proceder de fotografías, de diferentes sensores, etc. Se utiliza en diferentes campos como son la visión artificial, la imagen médica, el reconocimiento de objetos y el análisis de imágenes y datos de satélites. El registro se utiliza para poder comparar o integrar los datos obtenidos en diferentes mediciones. En este trabajo se realiza un repaso del estado del arte de los métodos de registro 3D. Al mismo tiempo, se presenta un profundo estudio sobre el método de registro 3D más utilizado, Iterative Closest Point (ICP), y una de sus variantes más conocidas, Expectation-Maximization ICP (EMICP). Este estudio contempla tanto su implementación secuencial como su implementación paralela en dispositivos GPU, centrándose en cómo afectan a su rendimiento las distintas configuraciones de parámetros GPU. Como consecuencia de este estudio, también se presenta una propuesta para mejorar el aprovechamiento de la memoria de los dispositivos GPU, permitiendo el trabajo con nubes de puntos más grandes, reduciendo el problema de la limitación de memoria impuesta por el dispositivo. El funcionamiento de los métodos de registro 3D utilizados en este trabajo depende en gran medida de la inicialización del problema. En este caso, esa inicialización del problema consiste en la correcta elección de la matriz de transformación con la que se iniciará el algoritmo. Debido a que este aspecto es muy importante en este tipo de algoritmos, ya que de él depende llegar antes o no a la solución o, incluso, no llegar nunca a la solución, en este trabajo se presenta un estudio sobre el espacio de transformaciones con el objetivo de caracterizarlo y facilitar la elección de la transformación inicial a utilizar en estos algoritmos.
|
3 |
Development of a Spatial Coordinate Digitizer for Applications in Structural Dynamics using an RGB-D CameraUdupa, Varun January 2018 (has links)
No description available.
|
4 |
A New Inspection Method Based on RGB-D ProfilingSiddiqui, Affan Ahmed 16 October 2015 (has links)
This thesis presents an inspection method based on RGB-D profiling for the rail industry. The proposed approach uses inexpensive RGB-D cameras to generate color and geometrical information of the observations, and stitches each consecutive scan from the sensor to form a map, provided that the two scans contain the information from the same observation. Using a technique known as pairwise registration, the errors between these consecutive scans are minimized using error minimization algorithms such as Iterative Closest Point and Normal Distributions Transform. Once the error between each consecutive scan is minimized, the scans are then converted into a global co-ordinate frame work to form a global map of all the added scans. The proposed approach could be used as a map-based identification technique by comparing the past global map to newly acquired scans while also reducing computation time effectively. The effectiveness of this approach is demonstrated by developing a system that uses multiple RGB-D cameras to detect railway defects such as spikes. The applicability of the proposed approach to other applications is then evaluated by profiling long lengths of road. / Master of Science
|
5 |
Desenvolvimento de um método para a captura de movimentos humanos usando uma câmera RGB-D / Development of a method for capturing human motion using an RGB-D cameraMotta, Everton Simões da [UNESP] 22 December 2016 (has links)
Submitted by EVERTON SIMÕES DA MOTTA (evmotta08@gmail.com) on 2017-01-05T13:08:55Z
No. of bitstreams: 1
DissertacaoFinalCorrigida-v5.pdf: 7335404 bytes, checksum: fa4ba2ff69cd4d2dd72b1b5cb475111b (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2017-01-09T17:01:51Z (GMT) No. of bitstreams: 1
motta_es_me_rcla.pdf: 7335404 bytes, checksum: fa4ba2ff69cd4d2dd72b1b5cb475111b (MD5) / Made available in DSpace on 2017-01-09T17:01:51Z (GMT). No. of bitstreams: 1
motta_es_me_rcla.pdf: 7335404 bytes, checksum: fa4ba2ff69cd4d2dd72b1b5cb475111b (MD5)
Previous issue date: 2016-12-22 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Sistemas de captura de movimentos humanos vêm sendo cada dia mais estudados, tanto pela área de Visão Computacional, quanto por grandes empresas do setor de entretenimento. São sistemas capazes de rastrear a posição e orientação das articulações do corpo e sua trajetória no espaço durante um intervalo de tempo. São utilizados em diversas aplicações, tais como em jogos digitais, animação de personagens virtuais para cinema e televisão, reconhecimento gestual, medicina de reabilitação, e outras. O surgimento de novos dispositivos de baixo custo e boa resolução que fornecem informações de profundidade, tem motivado novas pesquisas para a utilização dos mesmos. No entanto, sistemas que se baseiam somente em informações de profundidade (geralmente sistemas em tempo real) não apresentam uma alta acurácia no rastreamento do movimento. Considerando este contexto, o presente trabalho teve como objetivo principal, o desenvolvimento de um método de captura de movimentos humanos utilizando único sensor RGB-D, combinando informações de textura da imagem e de profundidade, que são associados a um esqueleto virtual, conseguindo-se uma maior acurácia em comparação com os métodos baseados apenas em profundidade. Tal método não visa aplicações de tempo real, mas sim, uma maior acurácia em comparação com os métodos baseados apenas em profundidade. / Human motion capture systems are being increasingly studied, in the area of computer vision and also by major entertainment industries. These systems are able to track the position and orientation of joints of the body and its trajectory in space over a period of time. They are used in various applications such as digital games, animation of virtual characters for film and television, gesture recognition, medical rehabilitation, etc. The emergence of new low-cost and good resolution devices that provide depth information has prompted new research. However, systems that are based only on depth information (usually real-time systems) do not present a high accuracy in movement tracking. Considering this context, this thesis project presents the development of a method for capturing human movements using only one RGB-D sensor, combining captured texture from the image and depth information in order to obtain a higher accuracy, which are associated with a virtual skeleton. This method is not intended for real-time applications, but those that require greater accuracy, such as the animation of virtual characters in video and medical rehabilitation.
|
6 |
Object detection and pose estimation from rectification of natural features using consumer RGB-D sensorsLima, João Paulo Silva do Monte 31 January 2014 (has links)
Submitted by Nayara Passos (nayara.passos@ufpe.br) on 2015-03-12T13:53:20Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
TESE João Paulo Silva do Monte Lima.pdf: 5689244 bytes, checksum: e6e2c3f1da85d18b6bb7049e458f50e7 (MD5) / Made available in DSpace on 2015-03-12T13:53:20Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
TESE João Paulo Silva do Monte Lima.pdf: 5689244 bytes, checksum: e6e2c3f1da85d18b6bb7049e458f50e7 (MD5)
Previous issue date: 2014 / CAPES , CNPq / Sistemas de Realidade Aumentada são capazes de realizar registro 3D em tempo real de objetos virtuais e reais, o que consiste em posicionar corretamente os objetos virtuais em relação aos reais de forma que os elementos virtuais pareçam ser reais. Uma maneira bastante popular de realizar esse registro é usando detecção e rastreamento de objetos baseado em vídeo a partir de marcadores fiduciais planares. Outra maneira de sensoriar o mundo real usando vídeo é utilizando características naturais do ambiente, o que é mais complexo que usar marcadores planares artificiais. Entretanto, detecção e rastreamento de características naturais é mandatório ou desejável em alguns cenários de aplicação de Realidade Aumentada. A detecção e o rastreamento de objetos a partir de características naturais pode fazer uso de um modelo 3D do objeto obtido a priori. Se tal modelo não está disponível, ele pode ser adquirido usando reconstrução 3D, por exemplo. Nesse caso, um sensor RGB-D pode ser usado, que se tornou nos últimos anos um produto de fácil acesso aos usuários em geral. Ele provê uma imagem em cores e uma imagem de profundidade da cena e, além de ser usado para modelagem de objetos, também pode oferecer informações importantes para a detecção e o rastreamento de objetos em tempo real.
Nesse contexto, o trabalho proposto neste documento tem por finalidade investigar o uso de sensores RGB-D de consumo para detecção e estimação de pose de objetos a partir de características naturais, com o propósito de usar tais técnicas para desenvolver aplicações de Realidade Aumentada. Dois métodos baseados em retificação auxiliada por profundidade são propostos, que transformam características extraídas de uma imagem em cores para uma vista canônica usando dados de profundidade para obter uma representação invariante a rotação, escala e distorções de perspectiva. Enquanto um método é adequado a objetos texturizados, tanto planares como não-planares, o outro método foca em objetos planares não texturizados. Avaliações qualitativas e quantitativas dos métodos propostos são realizadas, mostrando que eles podem obter resultados melhores que alguns métodos existentes para detecção e estimação de pose de objetos, especialmente ao lidar com poses oblíquas. / Augmented Reality systems are able to perform real-time 3D registration of virtual and real objects, which consists in correctly positioning the virtual objects with respect to the real ones such that the virtual elements seem to be real. A very popular way to perform this registration is using video based object detection and tracking with planar fiducial markers. Another way of sensing the real world using video is by relying on natural features of the environment, which is more complex than using artificial planar markers. Nevertheless, natural feature detection and tracking is mandatory or desirable in some Augmented Reality application scenarios. Object detection and tracking from natural features can make use of a 3D model of the object which was obtained a priori. If such model is not available, it can be acquired using 3D reconstruction. In this case, an RGB-D sensor can be used, which has become in recent years a product of easy access to general users. It provides both a color image and a depth image of the scene and, besides being used for object modeling, it can also offer important cues for object detection and tracking in real-time.
In this context, the work proposed in this document aims to investigate the use of consumer RGB-D sensors for object detection and pose estimation from natural features, with the purpose of using such techniques for developing Augmented Reality applications. Two methods based on depth-assisted rectification are proposed, which transform features extracted from the color image to a canonical view using depth data in order to obtain a representation invariant to rotation, scale and perspective distortions. While one method is suitable for textured objects, either planar or non-planar, the other method focuses on texture-less planar objects. Qualitative and quantitative evaluations of the proposed methods are performed, showing that they can obtain better results than some existing methods for object detection and pose estimation, especially when dealing with oblique poses.
|
7 |
Vers un système de capture du mouvement humain en 3D pour un robot mobile évoluant dans un environnement encombré / Toward a motion capture system in 3D for a mobile robot moving in a cluttered environmentDib, Abdallah 24 May 2016 (has links)
Dans cette thèse nous intéressons à la conception d'un robot mobile capable d’analyser le comportement et le mouvement d’une personne en environnement intérieur et encombré, par exemple le domicile d’une personne âgée. Plus précisément, notre objectif est de doter le robot des capacités de perception visuelle de la posture humaine de façon à mieux maîtriser certaines situations qui nécessitent de comprendre l’intention des personnes avec lesquelles le robot interagit, ou encore de détecter des situations à risques comme les chutes ou encore d’analyser les capacités motrices des personnes dont il a la garde. Le suivi de la posture dans un environnement dynamique et encombré relève plusieurs défis notamment l'apprentissage en continue du fond de la scène et l'extraction la silhouette qui peut être partiellement observable lorsque la personne est dans des endroits occultés. Ces difficultés rendent le suivi de la posture une tâche difficile. La majorité des méthodes existantes, supposent que la scène est statique et la personne est toujours visible en entier. Ces approches ne sont pas adaptées pour fonctionner dans des conditions réelles. Nous proposons, dans cette thèse, un nouveau système de suivi capable de suivre la posture de la personne dans ces conditions réelles. Notre approche utilise une grille d'occupation avec un modèle de Markov caché pour apprendre en continu l'évolution de la scène et d'extraire la silhouette, ensuite un algorithme de filtrage particulaire hiérarchique est utilisé pour reconstruire la posture. Nous proposons aussi un nouvel algorithme de gestion d'occlusion capable d'identifier et d'exclure les parties du corps cachées du processus de l'estimation de la pose. Finalement, nous avons proposé une base de données contenant des images RGB-D avec la vérité-terrain dans le but d'établir une nouvelle référence pour l'évaluation des systèmes de capture de mouvement dans un environnement réel avec occlusions. La vérité-terrain est obtenue à partir d'un système de capture de mouvement à base de marqueur de haute précision avec huit caméras infrarouges. L'ensemble des données est disponible en ligne. La deuxième contribution de cette thèse, est le développement d'une méthode de localisation visuelle à partir d'une caméra du type RGB-D montée sur un robot qui se déplace dans un environnement dynamique. En effet, le système de capture de mouvement que nous avons développé doit équiper un robot se déplaçant dans une scène. Ainsi, l'estimation de mouvement du robot est importante pour garantir une extraction de silhouette correcte pour le suivi. La difficulté majeure de la localisation d'une caméra dans un environnement dynamique, est que les objets mobiles de la scène induisent un mouvement supplémentaire qui génère des pixels aberrants. Ces pixels doivent être exclus du processus de l'estimation du mouvement de la caméra. Nous proposons ainsi une extension de la méthode de localisation dense basée sur le flux optique pour isoler les pixels aberrants en utilisant l'algorithme de RANSAC. / In this thesis we are interested in designing a mobile robot able to analyze the behavior and movement of a a person in indoor and cluttered environment. Our goal is to equip the robot by visual perception capabilities of the human posture to better analyze situations that require understanding of person with which the robot interacts, or detect risk situations such as falls or analyze motor skills of the person. Motion capture in a dynamic and crowded environment raises multiple challenges such as learning the background of the environment and extracting the silhouette that can be partially observable when the person is in hidden places. These difficulties make motion capture difficult. Most of existing methods assume that the scene is static and the person is always fully visible by the camera. These approaches are not able to work in such realistic conditions. In this thesis, We propose a new motion capture system capable of tracking a person in realistic world conditions. Our approach uses a 3D occupancy grid with a hidden Markov model to continuously learn the changing background of the scene and to extract silhouette of the person, then a hierarchical particle filtering algorithm is used to reconstruct the posture. We propose a novel occlusion management algorithm able to identify and discards hidden body parts of the person from process of the pose estimation. We also proposed a new database containing RGBD images with ground truth data in order to establish a new benchmark for the assessment of motion capture systems in a real environment with occlusions. The ground truth is obtained from a motion capture system based on high-precision marker with eight infrared cameras. All data is available online. The second contribution of this thesis is the development of a new visual odometry method to localize an RGB-D camera mounted on a robot moving in a dynamic environment. The major difficulty of the localization in a dynamic environment, is that mobile objects in the scene induce additional movement that generates outliers pixels. These pixels should be excluded from the camera motion estimation process in order to produce accurate and precise localization. We thus propose an extension of the dense localization method based on the optical flow method to remove outliers pixels using the RANSAC algorithm.
|
8 |
Adaptive registration using 2D and 3D features for indoor scene reconstruction. / Registro adaptativo usando características 2D e 3D para reconstrução de cenas em ambientes internos.Perafán Villota, Juan Carlos 27 October 2016 (has links)
Pairwise alignment between point clouds is an important task in building 3D maps of indoor environments with partial information. The combination of 2D local features with depth information provided by RGB-D cameras are often used to improve such alignment. However, under varying lighting or low visual texture, indoor pairwise frame registration with sparse 2D local features is not a particularly robust method. In these conditions, features are hard to detect, thus leading to misalignment between consecutive pairs of frames. The use of 3D local features can be a solution as such features come from the 3D points themselves and are resistant to variations in visual texture and illumination. Because varying conditions in real indoor scenes are unavoidable, we propose a new framework to improve the pairwise frame alignment using an adaptive combination of sparse 2D and 3D features based on both the levels of geometric structure and visual texture contained in each scene. Experiments with datasets including unrestricted RGB-D camera motion and natural changes in illumination show that the proposed framework convincingly outperforms methods using 2D or 3D features separately, as reflected in better level of alignment accuracy. / O alinhamento entre pares de nuvens de pontos é uma tarefa importante na construção de mapas de ambientes em 3D. A combinação de características locais 2D com informação de profundidade fornecida por câmeras RGB-D são frequentemente utilizadas para melhorar tais alinhamentos. No entanto, em ambientes internos com baixa iluminação ou pouca textura visual o método usando somente características locais 2D não é particularmente robusto. Nessas condições, as características 2D são difíceis de serem detectadas, conduzindo a um desalinhamento entre pares de quadros consecutivos. A utilização de características 3D locais pode ser uma solução uma vez que tais características são extraídas diretamente de pontos 3D e são resistentes a variações na textura visual e na iluminação. Como situações de variações em cenas reais em ambientes internos são inevitáveis, essa tese apresenta um novo sistema desenvolvido com o objetivo de melhorar o alinhamento entre pares de quadros usando uma combinação adaptativa de características esparsas 2D e 3D. Tal combinação está baseada nos níveis de estrutura geométrica e de textura visual contidos em cada cena. Esse sistema foi testado com conjuntos de dados RGB-D, incluindo vídeos com movimentos irrestritos da câmera e mudanças naturais na iluminação. Os resultados experimentais mostram que a nossa proposta supera aqueles métodos que usam características 2D ou 3D separadamente, obtendo uma melhora da precisão no alinhamento de cenas em ambientes internos reais.
|
9 |
Unconstrained Gaze Estimation Using RGB-D Camera. / Estimation du regard avec une caméra RGB-D dans des environnements utilisateur non-contraintsKacete, Amine 15 December 2016 (has links)
Dans ce travail, nous avons abordé le problème d’estimation automatique du regard dans des environnements utilisateur sans contraintes. Ce travail s’inscrit dans la vision par ordinateur appliquée à l’analyse automatique du comportement humain. Plusieurs solutions industrielles sont aujourd’hui commercialisées et donnent des estimations précises du regard. Certaines ont des spécifications matérielles très complexes (des caméras embarquées sur un casque ou sur des lunettes qui filment le mouvement des yeux) et présentent un niveau d’intrusivité important, ces solutions sont souvent non accessible au grand public. Cette thèse vise à produire un système d’estimation automatique du regard capable d’augmenter la liberté du mouvement de l’utilisateur par rapport à la caméra (mouvement de la tête, distance utilisateur-capteur), et de réduire la complexité du système en utilisant des capteurs relativement simples et accessibles au grand public. Dans ce travail, nous avons exploré plusieurs paradigmes utilisés par les systèmes d’estimation automatique du regard. Dans un premier temps, Nous avons mis au point deux systèmes basés sur deux approches classiques: le premier basé caractéristiques et le deuxième basé semi apparence. L’inconvénient majeur de ces paradigmes réside dans la conception des systèmes d'estimation du regard qui supposent une indépendance totale entre l'image d'apparence des yeux et la pose de la tête. Pour corriger cette limitation, Nous avons convergé vers un nouveau paradigme qui unifie les deux blocs précédents en construisant un espace regard global, nous avons exploré deux directions en utilisant des données réelles et synthétiques respectivement. / In this thesis, we tackled the automatic gaze estimation problem in unconstrained user environments. This work takes place in the computer vision research field applied to the perception of humans and their behaviors. Many existing industrial solutions are commercialized and provide an acceptable accuracy in gaze estimation. These solutions often use a complex hardware such as range of infrared cameras (embedded on a head mounted or in a remote system) making them intrusive, very constrained by the user's environment and inappropriate for a large scale public use. We focus on estimating gaze using cheap low-resolution and non-intrusive devices like the Kinect sensor. We develop new methods to address some challenging conditions such as head pose changes, illumination conditions and user-sensor large distance. In this work we investigated different gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous components and building a global gaze manifold, we explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples.
|
10 |
Suivi de caméra image en temps réel base et cartographie de l'environnement / Real-time image-based RGB-D camera motion tracking and environment mappingTykkälä, Tommi 04 September 2013 (has links)
Dans ce travail, méthodes d'estimation basées sur des images, également connu sous le nom de méthodes directes, sont étudiées qui permettent d'éviter l'extraction de caractéristiques et l'appariement complètement. L'objectif est de produire pose 3D précis et des estimations de la structure. Les fonctions de coût présenté minimiser l'erreur du capteur, car les mesures ne sont pas transformés ou modifiés. Dans la caméra photométrique estimation de la pose, rotation 3D et les paramètres de traduction sont estimées en minimisant une séquence de fonctions de coûts à base d'image, qui sont des non-linéaires en raison de la perspective projection et la distorsion de l'objectif. Dans l'image la structure basée sur le raffinement, d'autre part, de la structure 3D est affinée en utilisant un certain nombre de vues supplémentaires et un coût basé sur l'image métrique. Les principaux domaines d'application dans ce travail sont des reconstitutions d'intérieur, la robotique et la réalité augmentée. L'objectif global du projet est d'améliorer l'image des méthodes d'estimation fondées, et pour produire des méthodes de calcul efficaces qui peuvent être accueillis dans des applications réelles. Les principales questions pour ce travail sont : Qu'est-ce qu'une formulation efficace pour une image 3D basé estimation de la pose et de la structure tâche de raffinement ? Comment organiser calcul afin de permettre une mise en œuvre efficace en temps réel ? Quelles sont les considérations pratiques utilisant l'image des méthodes d'estimation basées sur des applications telles que la réalité augmentée et la reconstruction 3D ? / In this work, image based estimation methods, also known as direct methods, are studied which avoid feature extraction and matching completely. Cost functions use raw pixels as measurements and the goal is to produce precise 3D pose and structure estimates. The cost functions presented minimize the sensor error, because measurements are not transformed or modified. In photometric camera pose estimation, 3D rotation and translation parameters are estimated by minimizing a sequence of image based cost functions, which are non-linear due to perspective projection and lens distortion. In image based structure refinement, on the other hand, 3D structure is refined using a number of additional views and an image based cost metric. Image based estimation methods are usable whenever the Lambertian illumination assumption holds, where 3D points have constant color despite viewing angle. The main application domains in this work are indoor 3D reconstructions, robotics and augmented reality. The overall project goal is to improve image based estimation methods, and to produce computationally efficient methods which can be accomodated into real applications. The main questions for this work are : What is an efficient formulation for an image based 3D pose estimation and structure refinement task ? How to organize computation to enable an efficient real-time implementation ? What are the practical considerations of using image based estimation methods in applications such as augmented reality and 3D reconstruction ?
|
Page generated in 0.0406 seconds