1 |
Robust 3D registration and tracking with RGBD sensorsAmamra, A 26 June 2015 (has links)
This thesis investigates the utilisation of cheap RGBD sensors in rigid body tracking and 3D multiview registration for augmented and Virtual reality applications. RGBD sensors can be used as an affordable substitute for the more sophisticated, but expensive, conventional laser-based scanning and tracking solutions. Nevertheless, the low-cost sensing technology behind them has several drawbacks such as the limited range, significant noisiness and instability.
To deal with these issues, an innovative adaptation of Kalman filtering scheme is first proposed to improve the precision, smoothness and robustness of raw RGBD outputs. It also extends the native capabilities of the sensor to capture further targets. The mathematical foundations of such an adaptation are explained in detail, and its corrective effect is validated with real tracking as well as 3D reconstruction experiments. A Graphics Processing Unit (GPU) implementation is also proposed with the different optimisation levels in order to ensure real-time responsiveness.
After extensive experimentation with RGBD cameras, a significant difference in accuracy was noticed between the newer and ageing sensors. This decay could not be restored with conventional calibration. Thus, a novel method for worn RGBD sensors correction is also proposed.
Another algorithm for background/foreground segmentation of RGBD images is contributed. The latter proceeds through background subtraction from colour and depth images separately, the resulting foreground regions are then fused for a more robust detection.
The three previous contributions are used in a novel approach for multiview vehicle tracking for mixed reality needs. The determination of the position regarding the vehicle is achieved in two stages: the former is a sensor-wise robust filtering algorithm that is able to handle the uncertainties in the system and measurement models resulting in multiple position estimates; the latter algorithm aims at merging the independent estimates by using a set of optimal weighting coefficients. The outcome of fusion is used to determine vehicle’s orientation in the scene.
Finally, a novel recursive filtering approach for sparse registration is proposed. Unlike ordinary state of the art alignment algorithms, the proposed method has four advantages that are not available altogether in any previous solution. It is able to deal with inherent noise contaminating sensory data; it is robust to uncertainties related to feature localisation; it combines the advantages of both L2 , L (infinity) norms for a higher performance and prevention of local minima; it also provides an estimated rigid body transformation along with its error covariance. This 3D registration scheme is validated in various challenging scenarios with both synthetic and real RGBD data.
|
2 |
Robust 3D registration and tracking with RGBD sensorsAmamra, A. January 2015 (has links)
This thesis investigates the utilisation of cheap RGBD sensors in rigid body tracking and 3D multiview registration for augmented and Virtual reality applications. RGBD sensors can be used as an affordable substitute for the more sophisticated, but expensive, conventional laser-based scanning and tracking solutions. Nevertheless, the low-cost sensing technology behind them has several drawbacks such as the limited range, significant noisiness and instability. To deal with these issues, an innovative adaptation of Kalman filtering scheme is first proposed to improve the precision, smoothness and robustness of raw RGBD outputs. It also extends the native capabilities of the sensor to capture further targets. The mathematical foundations of such an adaptation are explained in detail, and its corrective effect is validated with real tracking as well as 3D reconstruction experiments. A Graphics Processing Unit (GPU) implementation is also proposed with the different optimisation levels in order to ensure real-time responsiveness. After extensive experimentation with RGBD cameras, a significant difference in accuracy was noticed between the newer and ageing sensors. This decay could not be restored with conventional calibration. Thus, a novel method for worn RGBD sensors correction is also proposed. Another algorithm for background/foreground segmentation of RGBD images is contributed. The latter proceeds through background subtraction from colour and depth images separately, the resulting foreground regions are then fused for a more robust detection. The three previous contributions are used in a novel approach for multiview vehicle tracking for mixed reality needs. The determination of the position regarding the vehicle is achieved in two stages: the former is a sensor-wise robust filtering algorithm that is able to handle the uncertainties in the system and measurement models resulting in multiple position estimates; the latter algorithm aims at merging the independent estimates by using a set of optimal weighting coefficients. The outcome of fusion is used to determine vehicle’s orientation in the scene. Finally, a novel recursive filtering approach for sparse registration is proposed. Unlike ordinary state of the art alignment algorithms, the proposed method has four advantages that are not available altogether in any previous solution. It is able to deal with inherent noise contaminating sensory data; it is robust to uncertainties related to feature localisation; it combines the advantages of both L2 , L (infinity) norms for a higher performance and prevention of local minima; it also provides an estimated rigid body transformation along with its error covariance. This 3D registration scheme is validated in various challenging scenarios with both synthetic and real RGBD data.
|
3 |
Difference-Based Temporal Module for Monocular Category-Level 6 DoF Object Pose TrackingChen, Zishen 22 January 2024 (has links)
Monocular 6DoF pose tracking has many applications in augmented reality, robotics and other areas and because of the rise of deep learning new approaches such as category-level models are successful. The temporal information in sequential data is essential for both online and offline tasks, which can help boost the quality of predictions while encountering some unexpected influences like occlusions and vibration. In 2D object detection and tracking, substantial research has been done in leveraging temporal information to improve the performance of the model. Nevertheless, it is challenging to lift the temporal processing to 3D space because of the ambiguity of the visual data. In this thesis, we propose a method to calculate the temporal difference of points and pixels assuming that the K nearest points share similar features. The extracted features from the difference are learned to weigh the relevant points in the temporal sequence and aggregate them to provide support to the current frame's prediction. We propose a novel difference-based temporal module to incorporate both RGB and 3D points data in a temporal sequence. This module can be easily integrated with any category-level 6DoF pose tracking model which uses RGB and 3D points as input. We evaluate this module on two state-of-the-art category-level 6D pose tracking models and the result shows that it can increase the model's accuracy and robustness in complex scenarios.
|
4 |
Development of a Multimodal Human-computer Interface for the Control of a Mobile RobotJacques, Maxime 07 June 2012 (has links)
The recent advent of consumer grade Brain-Computer Interfaces (BCI) provides a new revolutionary and accessible way to control computers. BCI translate cognitive electroencephalography (EEG) signals into computer or robotic commands using specially built headsets. Capable of enhancing traditional interfaces that require interaction with a keyboard, mouse or touchscreen, BCI systems present tremendous opportunities to benefit various fields. Movement restricted users can especially benefit from these interfaces. In this thesis, we present a new way to interface a consumer-grade BCI solution to a mobile robot. A Red-Green-Blue-Depth (RGBD) camera is used to enhance the navigation of the robot with cognitive thoughts as commands. We introduce an interface presenting 3 different methods of robot-control: 1) a fully manual mode, where a cognitive signal is interpreted as a command, 2) a control-flow manual mode, reducing the likelihood of false-positive commands and 3) an automatic mode assisted by a remote RGBD camera. We study the application of this work by navigating the mobile robot on a planar surface using the different control methods while measuring the accuracy and usability of the system. Finally, we assess the newly designed interface’s role in the design of future generation of BCI solutions.
|
5 |
Development of a Multimodal Human-computer Interface for the Control of a Mobile RobotJacques, Maxime 07 June 2012 (has links)
The recent advent of consumer grade Brain-Computer Interfaces (BCI) provides a new revolutionary and accessible way to control computers. BCI translate cognitive electroencephalography (EEG) signals into computer or robotic commands using specially built headsets. Capable of enhancing traditional interfaces that require interaction with a keyboard, mouse or touchscreen, BCI systems present tremendous opportunities to benefit various fields. Movement restricted users can especially benefit from these interfaces. In this thesis, we present a new way to interface a consumer-grade BCI solution to a mobile robot. A Red-Green-Blue-Depth (RGBD) camera is used to enhance the navigation of the robot with cognitive thoughts as commands. We introduce an interface presenting 3 different methods of robot-control: 1) a fully manual mode, where a cognitive signal is interpreted as a command, 2) a control-flow manual mode, reducing the likelihood of false-positive commands and 3) an automatic mode assisted by a remote RGBD camera. We study the application of this work by navigating the mobile robot on a planar surface using the different control methods while measuring the accuracy and usability of the system. Finally, we assess the newly designed interface’s role in the design of future generation of BCI solutions.
|
6 |
Development of a Multimodal Human-computer Interface for the Control of a Mobile RobotJacques, Maxime January 2012 (has links)
The recent advent of consumer grade Brain-Computer Interfaces (BCI) provides a new revolutionary and accessible way to control computers. BCI translate cognitive electroencephalography (EEG) signals into computer or robotic commands using specially built headsets. Capable of enhancing traditional interfaces that require interaction with a keyboard, mouse or touchscreen, BCI systems present tremendous opportunities to benefit various fields. Movement restricted users can especially benefit from these interfaces. In this thesis, we present a new way to interface a consumer-grade BCI solution to a mobile robot. A Red-Green-Blue-Depth (RGBD) camera is used to enhance the navigation of the robot with cognitive thoughts as commands. We introduce an interface presenting 3 different methods of robot-control: 1) a fully manual mode, where a cognitive signal is interpreted as a command, 2) a control-flow manual mode, reducing the likelihood of false-positive commands and 3) an automatic mode assisted by a remote RGBD camera. We study the application of this work by navigating the mobile robot on a planar surface using the different control methods while measuring the accuracy and usability of the system. Finally, we assess the newly designed interface’s role in the design of future generation of BCI solutions.
|
7 |
Reconnaissance visuelle robuste par réseaux de neurones dans des scénarios d'exploration robotique. Détecte-moi si tu peux ! / Robust visual recognition by neural networks in robotic exploration scenarios. Detect me if you can!Guerry, Joris 20 November 2017 (has links)
L'objectif principal ce travail de thèse est la reconnaissance visuelle pour un robot mobile dans des conditions difficiles. En particulier nous nous intéressons aux réseaux de neurones qui présentent aujourd'hui les meilleures performances en vision par ordinateur. Nous avons étudié le principe de sélection de méthodes pour la classification d'images 2D en utilisant un réseau de neurones sélecteur pour choisir le meilleur classifieur disponible étant donnée la situation observée. Cette stratégie fonctionne lorsque les données peuvent être facilement partitionnées vis-à-vis des classifieurs disponibles, ce qui est le cas quand des modalités complémentaires sont utilisées. Nous avons donc utilisé des données RGB-D (2.5D) en particulier appliquées à la détection de personnes. Nous proposons une combinaison de réseaux de neurones détecteurs indépendants propres à chaque modalité (couleur & carte de profondeur) basés sur une même architecture (le Faster RCNN). Nous partageons des résultats intermédiaires des détecteurs pour leur permettre de se compléter et d'améliorer la performance globale en situation difficile (perte de luminosité ou bruit d'acquisition de la carte de profondeur). Nous établissons un nouvel état de l'art dans le domaine et proposons un jeu de données plus complexe et plus riche à la communauté (ONERA.ROOM). Enfin, nous avons fait usage de l'information 3D contenue dans les images RGB-D au travers d'une méthode multi-vue. Nous avons défini une stratégie de génération de vues virtuelles 2D cohérentes avec la structure 3D. Pour une tâche de segmentation sémantique, cette approche permet d'augmenter artificiellement les données d'entraînement pour chaque image RGB-D et d'accumuler différentes prédictions lors du test. Nous obtenons de nouveaux résultats de référence sur les jeux de données SUNRGBD et NYUDv2. Ces travaux de thèse nous ont permis d'aborder de façon originale des données robotiques 2D, 2.5D et 3D avec des réseaux de neurones. Que ce soit pour la classification, la détection et la segmentation sémantique, nous avons non seulement validé nos approches sur des jeux de données difficiles, mais également amené l'état de l'art à un nouveau niveau de performance. / The main objective of this thesis is visual recognition for a mobile robot in difficult conditions. We are particularly interested in neural networks which present today the best performances in computer vision. We studied the concept of method selection for the classification of 2D images by using a neural network selector to choose the best available classifier given the observed situation. This strategy works when data can be easily partitioned with respect to available classifiers, which is the case when complementary modalities are used. We have therefore used RGB-D data (2.5D) in particular applied to people detection. We propose a combination of independent neural network detectors specific to each modality (color & depth map) based on the same architecture (Faster RCNN). We share intermediate results of the detectors to allow them to complement and improve overall performance in difficult situations (luminosity loss or acquisition noise of the depth map). We are establishing new state of the art scores in the field and propose a more complex and richer data set to the community (ONERA.ROOM). Finally, we made use of the 3D information contained in the RGB-D images through a multi-view method. We have defined a strategy for generating 2D virtual views that are consistent with the 3D structure. For a semantic segmentation task, this approach artificially increases the training data for each RGB-D image and accumulates different predictions during the test. We obtain new reference results on the SUNRGBD and NYUDv2 datasets. All these works allowed us to handle in an original way 2D, 2.5D and 3D robotic data with neural networks. Whether for classification, detection and semantic segmentation, we not only validated our approaches on difficult data sets, but also brought the state of the art to a new level of performance.
|
8 |
Estimação de movimento a partir de imagens RGBD usando homomorfismo entre grafos / Motion estimation from RGBD images using graph homomorphismPires, David da Silva 14 December 2012 (has links)
Recentemente surgiram dispositivos sensores de profundidade capazes de capturar textura e geometria de uma cena em tempo real. Com isso, diversas técnicas de Visão Computacional, que antes eram aplicadas apenas a texturas, agora são passíveis de uma reformulação, visando o uso também da geometria. Ao mesmo tempo em que tais algoritmos, tirando vantagem dessa nova tecnologia, podem ser acelerados ou tornarem-se mais robustos, surgem igualmente diversos novos desafios e problemas interessantes a serem enfrentados. Como exemplo desses dispositivos podemos citar o do Projeto Vídeo 4D, do IMPA, e o Kinect (TM), da Microsoft. Esses equipamentos fornecem imagens que vêm sendo chamadas de RGBD, fazendo referência aos três canais de cores e ao canal adicional de profundidade (com a letra \'D\' vindo do termo depth, profundidade em inglês). A pesquisa descrita nesta tese apresenta uma nova abordagem não-supervisionada para a estimação de movimento a partir de vídeos compostos por imagens RGBD. Esse é um passo intermediário necessário para a identificação de componentes rígidos de um objeto articulado. Nosso método faz uso da técnica de casamento inexato (homomorfismo) entre grafos para encontrar grupos de pixels (blocos) que se movem para um mesmo sentido em quadros consecutivos de um vídeo. Com o intuito de escolher o melhor casamento para cada bloco, é minimizada uma função custo que leva em conta distâncias tanto no espaço de cores RGB quanto no XYZ (espaço tridimensional do mundo). A contribuição metodológica consiste justamente na manipulação dos dados de profundidade fornecidos pelos novos dispositivos de captura, de modo que tais dados passem a integrar o vetor de características que representa cada bloco nos grafos a serem casados. Nosso método não usa quadros de referência para inicialização e é aplicável a qualquer vídeo que contenha movimento paramétrico por partes. Para blocos cujas dimensões causem uma relativa diminuição na resolução das imagens, nossa aplicação roda em tempo real. Para validar a metodologia proposta, são apresentados resultados envolvendo diversas classes de objetos com diferentes tipos de movimento, tais como vídeos de pessoas caminhando, os movimento de um braço e um casal de dançarinos de samba de gafieira. Também são apresentados os avanços obtidos na modelagem de um sistema de vídeo 4D orientado a objetos, o qual norteia o desenvolvimento de diversas aplicações a serem desenvolvidas na continuação deste trabalho. / Depth-sensing devices have arised recently, allowing real-time scene texture and depth capture. As a result, many computer vision techniques, primarily applied only to textures, now can be reformulated using additional properties like the geometry. At the same time that these algorithms, making use of this new technology, can be accelerated or be made more robust, new interesting challenges and problems to be confronted are appearing. Examples of such devices include the 4D Video Project, from IMPA, and Kinect (TM) from Microsoft. These devices offer the so called RGBD images, being related to the three color channels and to the additional depth channel. The research described on this thesis presents a new non-supervised approach to estimate motion from videos composed by RGBD images. This is an intermediary and necessary step to identify the rigid components of an articulated object. Our method uses the technique of inexact graph matching (homomorphism) to find groups of pixels (patches) that move to the same direction in subsequent video frames. In order to choose the best matching for each patch, we minimize a cost function that accounts for distances on RGB color and XYZ (tridimensional world coordinates) spaces. The methodological contribution consists on depth data manipulation given by the new capture devices, such that these data become components of the feature vector that represents each patch on graphs to be matched. Our method does not use reference frames in order to be initialized and it can be applied to any video that contains piecewise parametric motion. For patches which allow a relative decrease on images resolution, our application runs in real-time. In order to validate the proposed methodology, we present results involving object classes with different movement kinds, such as videos with walking people, the motions of an arm and a couple of samba dancers. We also present the advances obtained on modeling an object oriented 4D video system, which guide a development of different applications to be developed as future work.
|
9 |
Scene Flow Estimation from RGBD Images / Estimation du flot de scène à partir des images RGBDQuiroga Sepúlveda, Julián 07 November 2014 (has links)
Cette thèse aborde le problème du calcul de manière fiable d'un champ de mouvement 3D, appelé flot de scène, à partir d'une paire d'images RGBD prises à des instants différents. Nous proposons un schéma d'estimation semi-rigide pour le calcul robuste du flot de scène, en prenant compte de l'information de couleur et de profondeur, et un cadre de minimisation alternée variationnelle pour récupérer les composantes rigides et non rigides du champ de mouvement 3D. Les tentatives précédentes pour estimer le flot de scène à partir des images RGBD étaient des extensions des approches de flux optique, et n'exploitaient pas totalement les données de profondeur, ou bien elles formulaient l'estimation dans l'espace 3D sans tenir compte de la semi-rigidité des scènes réelles. Nous démontrons que le flot de scène peut ^etre calculé de manière robuste et précise dans le domaine de l'image en reconstruisant un mouvement 3D cohérent avec la couleur et la profondeur, en encourageant une combinaison réglable entre rigidité locale et par morceaux. En outre, nous montrons que le calcul du champ de mouvement 3D peut être considéré comme un cas particulier d'un problème d'estimation plus général d'un champ de mouvements rigides à 6 dimensions. L'estimation du flot de scène est donc formulée comme la recherche d'un champ optimal de mouvements rigides. Nous montrons finalement que notre méthode permet d'obtenir des résultats comparables à l'état de l'art. / This thesis addresses the problem of reliably recovering a 3D motion field, or scene flow, from a temporal pair of RGBD images. We propose a semi-rigid estimation framework for the robust computation of scene flow, taking advantage of color and depth information, and an alternating variational minimization framework for recovering rigid and non-rigid components of the 3D motion field. Previous attempts to estimate scene flow from RGBD images have extended optical flow approaches without fully exploiting depth data or have formulated the estimation in 3D space disregarding the semi-rigidity of real scenes. We demonstrate that scene flow can be robustly and accurately computed in the image domain by solving for 3D motions consistent with color and depth, encouraging an adjustable combination between local and piecewise rigidity. Additionally, we show that solving for the 3D motion field can be seen as a specific case of a more general estimation problem of a 6D field of rigid motions. Accordingly, we formulate scene flow estimation as the search of an optimal field of twist motions achieving state-of-the-art results.STAR
|
10 |
La réalité augmentée : fusion de vision et navigation / Augmented reality : the fusion of vision and navigationZarrouati-Vissière, Nadège 20 December 2013 (has links)
Cette thèse a pour objet l'étude d'algorithmes pour des applications de réalité visuellement augmentée. Plusieurs besoins existent pour de telles applications, qui sont traités en tenant compte de la contrainte d'indistinguabilité de la profondeur et du mouvement linéaire dans le cas de l'utilisation de systèmes monoculaires. Pour insérer en temps réel de manière réaliste des objets virtuels dans des images acquises dans un environnement arbitraire et inconnu, il est non seulement nécessaire d'avoir une perception 3D de cet environnement à chaque instant, mais également d'y localiser précisément la caméra. Pour le premier besoin, on fait l'hypothèse d'une dynamique de la caméra connue, pour le second on suppose que la profondeur est donnée en entrée: ces deux hypothèses sont réalisables en pratique. Les deux problèmes sont posés dans lecontexte d'un modèle de caméra sphérique, ce qui permet d'obtenir des équations de mouvement invariantes par rotation pour l'intensité lumineuse comme pour la profondeur. L'observabilité théorique de ces problèmes est étudiée à l'aide d'outils de géométrie différentielle sur la sphère unité Riemanienne. Une implémentation pratique est présentée: les résultats expérimentauxmontrent qu'il est possible de localiser une caméra dans un environnement inconnu tout en cartographiant précisément cet environnement. / The purpose of this thesis is to study algorithms for visual augmented reality. Different requirements of such an application are addressed, with the constraint that the use of a monocular system makes depth and linear motion indistinguishable. The real-time realistic insertion of virtual objects in images of a real arbitrary environment yields the need for a dense Threedimensional (3D) perception of this environment on one hand, and a precise localization of the camera on the other hand. The first requirement is studied under an assumption of known dynamics, and the second under the assumption of known depth: both assumptions are practically realizable. Both problems are posed in the context of a spherical camera model, which yields SO(3)-invariant dynamical equations for light intensity and depth. The study of theoreticalobservability requires differential geometry tools for the Riemannian unit sphere. Practical implementation on a system is presented and experimental results demonstrate the ability to localize a camera in a unknown environment while precisely mapping this environment.
|
Page generated in 0.0609 seconds