Global ETD Search

41	Semantic Scene Segmentation using RGB-D & LRF fusion Lilja, Harald January 2020 (has links) In the field of robotics and autonomous vehicles, the use of RGB-D data and LiDAR sensors is a popular practice for applications such as SLAM[14], object classification[19] and scene understanding[5]. This thesis explores the problem of semantic segmentation using deep multimodal fusion of LRF and depth data. Two data set consisting of 1080 and 108 data points from two scenes is created and manually labeled in 2D space and transferred to 1D using a proposed label transfer method utilizing hierarchical clustering. The data set is used to train and validate the suggested method for segmentation using a proposed dual encoder-decoder network based on SalsaNet [1] with gradual fusion in the decoder. Applying the suggested method yielded an improvement in the scenario of an unseen circuit when compared to uni-modal segmentation using depth, RGB, laser, and a naive combination of RGB-D data. A suggestion of feature extraction in the form of PCA or stacked auto-encoders is suggested as a further improvement for this type of fusion. The source code and data set are made publicly available at https://github.com/Anguse/salsa_fusion. RGB-D LiDAR CNN deep multimodal fusion robotics autonomous vehicles Computer Systems Datorsystem Robotics Robotteknik och automation Embedded Systems Inbäddad systemteknik
42	Particle filter-based tracking to handle persistent and complex occlusions and imitate arbitrary black-box trackers / 長時間・複雑な遮蔽に対応、任意の追跡器を模倣可能なパーティクル・フィルターに基づく物体追跡 Kourosh, Meshgi 24 September 2015 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19342号 / 情博第594号 / 新制\|\|情\|\|103(附属図書館) / 32344 / 京都大学大学院情報学研究科システム科学専攻 / (主査)教授石井信, 教授杉江俊治, 教授大塚敏之 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Persistent and Complex Occlusions RGB-D Tracking Particle Filter Tracker Imitation Occlusion Mask Gridding Mode-Switching Tracker 007
43	Deep Synthetic Noise Generation for RGB-D Data Augmentation Hammond, Patrick Douglas 01 June 2019 (has links) Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets. RGB-D images depth completion synthetic augmentation deep-generative neural networks variational autoencoders conditional GANs Computer Sciences Physical Sciences and Mathematics
44	Examining the Effects of Key Point Detector and Descriptors on 3D Visual SLAM Murphy, Timothy Charles 27 April 2016 (has links) No description available. Computer Science Robotics
45	Odométrie visuelle directe et cartographie dense de grands environnements à base d'images panoramiques RGB-D / Direct visual odometry and dense large-scale environment mapping from panoramic RGB-D images Martins, Renato 27 October 2017 (has links) Cette thèse se situe dans le domaine de l'auto-localisation et de la cartographie 3D des caméras RGB-D pour des robots mobiles et des systèmes autonomes avec des caméras RGB-D. Nous présentons des techniques d'alignement et de cartographie pour effectuer la localisation d'une caméra (suivi), notamment pour des caméras avec mouvements rapides ou avec faible cadence. Les domaines d'application possibles sont la réalité virtuelle et augmentée, la localisation de véhicules autonomes ou la reconstruction 3D des environnements.Nous proposons un cadre consistant et complet au problème de localisation et cartographie 3D à partir de séquences d'images RGB-D acquises par une plateforme mobile. Ce travail explore et étend le domaine d'applicabilité des approches de suivi direct dites "appearance-based". Vis-à-vis des méthodes fondées sur l'extraction de primitives, les approches directes permettent une représentation dense et plus précise de la scène mais souffrent d'un domaine de convergence plus faible nécessitant une hypothèse de petits déplacements entre images.Dans la première partie de la thèse, deux contributions sont proposées pour augmenter ce domaine de convergence. Tout d'abord une méthode d'estimation des grands déplacements est développée s'appuyant sur les propriétés géométriques des cartes de profondeurs contenues dans l'image RGB-D. Cette estimation grossière (rough estimation) peut être utilisée pour initialiser la fonction de coût minimisée dans l'approche directe. Une seconde contribution porte sur l'étude des domaines de convergence de la partie photométrique et de la partie géométrique de cette fonction de coût. Il en résulte une nouvelle fonction de coût exploitant de manière adaptative l'erreur photométrique et géométrique en se fondant sur leurs propriétés de convergence respectives.Dans la deuxième partie de la thèse, nous proposons des techniques de régularisation et de fusion pour créer des représentations précises et compactes de grands environnements. La régularisation s'appuie sur une segmentation de l'image sphérique RGB-D en patchs utilisant simultanément les informations géométriques et photométriques afin d'améliorer la précision et la stabilité de la représentation 3D de la scène. Cette segmentation est également adaptée pour la résolution non uniforme des images panoramiques. Enfin les images régularisées sont fusionnées pour créer une représentation compacte de la scène, composée de panoramas RGB-D sphériques distribués de façon optimale dans l'environnement. Ces représentations sont particulièrement adaptées aux applications de mobilité, tâches de navigation autonome et de guidage, car elles permettent un accès en temps constant avec une faible occupation de mémoire qui ne dépendent pas de la taille de l'environnement. / This thesis is in the context of self-localization and 3D mapping from RGB-D cameras for mobile robots and autonomous systems. We present image alignment and mapping techniques to perform the camera localization (tracking) notably for large camera motions or low frame rate. Possible domains of application are localization of autonomous vehicles, 3D reconstruction of environments, security or in virtual and augmented reality. We propose a consistent localization and 3D dense mapping framework considering as input a sequence of RGB-D images acquired from a mobile platform. The core of this framework explores and extends the domain of applicability of direct/dense appearance-based image registration methods. With regard to feature-based techniques, direct/dense image registration (or image alignment) techniques are more accurate and allow us a more consistent dense representation of the scene. However, these techniques have a smaller domain of convergence and rely on the assumption that the camera motion is small.In the first part of the thesis, we propose two formulations to relax this assumption. Firstly, we describe a fast pose estimation strategy to compute a rough estimate of large motions, based on the normal vectors of the scene surfaces and on the geometric properties between the RGB-D images. This rough estimation can be used as initialization to direct registration methods for refinement. Secondly, we propose a direct RGB-D camera tracking method that exploits adaptively the photometric and geometric error properties to improve the convergence of the image alignment.In the second part of the thesis, we propose techniques of regularization and fusion to create compact and accurate representations of large scale environments. The regularization is performed from a segmentation of spherical frames in piecewise patches using simultaneously the photometric and geometric information to improve the accuracy and the consistency of the scene 3D reconstruction. This segmentation is also adapted to tackle the non-uniform resolution of panoramic images. Finally, the regularized frames are combined to build a compact keyframe-based map composed of spherical RGB-D panoramas optimally distributed in the environment. These representations are helpful for autonomous navigation and guiding tasks as they allow us an access in constant time with a limited storage which does not depend on the size of the environment. Recalage d'images Cartographie Odométrie visuelle Localisation SLAM visuel Images panoramiques RGB-D registration Mapping Visual odometry Localization Visual SLAM Panoramic images 006.8
46	REAL-TIME CAPTURE AND RENDERING OF PHYSICAL SCENE WITH AN EFFICIENTLY CALIBRATED RGB-D CAMERA NETWORK Su, Po-Chang 01 January 2017 (has links) From object tracking to 3D reconstruction, RGB-Depth (RGB-D) camera networks play an increasingly important role in many vision and graphics applications. With the recent explosive growth of Augmented Reality (AR) and Virtual Reality (VR) platforms, utilizing camera RGB-D camera networks to capture and render dynamic physical space can enhance immersive experiences for users. To maximize coverage and minimize costs, practical applications often use a small number of RGB-D cameras and sparsely place them around the environment for data capturing. While sparse color camera networks have been studied for decades, the problems of extrinsic calibration of and rendering with sparse RGB-D camera networks are less well understood. Extrinsic calibration is difficult because of inappropriate RGB-D camera models and lack of shared scene features. Due to the significant camera noise and sparse coverage of the scene, the quality of rendering 3D point clouds is much lower compared with synthetic models. Adding virtual objects whose rendering depend on the physical environment such as those with reflective surfaces further complicate the rendering pipeline. In this dissertation, I propose novel solutions to tackle these challenges faced by RGB-D camera systems. First, I propose a novel extrinsic calibration algorithm that can accurately and rapidly calibrate the geometric relationships across an arbitrary number of RGB-D cameras on a network. Second, I propose a novel rendering pipeline that can capture and render, in real-time, dynamic scenes in the presence of arbitrary-shaped reflective virtual objects. Third, I have demonstrated a teleportation application that uses the proposed system to merge two geographically separated 3D captured scenes into the same reconstructed environment. To provide a fast and robust calibration for a sparse RGB-D camera network, first, the correspondences between different camera views are established by using a spherical calibration object. We show that this approach outperforms other techniques based on planar calibration objects. Second, instead of modeling camera extrinsic using rigid transformation that is optimal only for pinhole cameras, different view transformation functions including rigid transformation, polynomial transformation, and manifold regression are systematically tested to determine the most robust mapping that generalizes well to unseen data. Third, the celebrated bundle adjustment procedure is reformulated to minimize the global 3D projection error so as to fine-tune the initial estimates. To achieve a realistic mirror rendering, a robust eye detector is used to identify the viewer's 3D location and render the reflective scene accordingly. The limited field of view obtained from a single camera is overcome by our calibrated RGB-D camera network system that is scalable to capture an arbitrarily large environment. The rendering is accomplished by raytracing light rays from the viewpoint to the scene reflected by the virtual curved surface. To the best of our knowledge, the proposed system is the first to render reflective dynamic scenes from real 3D data in large environments. Our scalable client-server architecture is computationally efficient - the calibration of a camera network system, including data capture, can be done in minutes using only commodity PCs. RGB-D Camera Network Real-time Capture and Rendering Virtual Curved Mirror 3D Telepresence 3D Interaction Computer Sciences Electrical and Computer Engineering Graphics and Human Computer Interfaces
47	Visual object perception in unstructured environments Choi, Changhyun 12 January 2015 (has links) As robotic systems move from well-controlled settings to increasingly unstructured environments, they are required to operate in highly dynamic and cluttered scenarios. Finding an object, estimating its pose, and tracking its pose over time within such scenarios are challenging problems. Although various approaches have been developed to tackle these problems, the scope of objects addressed and the robustness of solutions remain limited. In this thesis, we target a robust object perception using visual sensory information, which spans from the traditional monocular camera to the more recently emerged RGB-D sensor, in unstructured environments. Toward this goal, we address four critical challenges to robust 6-DOF object pose estimation and tracking that current state-of-the-art approaches have, as yet, failed to solve. The first challenge is how to increase the scope of objects by allowing visual perception to handle both textured and textureless objects. A large number of 3D object models are widely available in online object model databases, and these object models provide significant prior information including geometric shapes and photometric appearances. We note that using both geometric and photometric attributes available from these models enables us to handle both textured and textureless objects. This thesis presents our efforts to broaden the spectrum of objects to be handled by combining geometric and photometric features. The second challenge is how to dependably estimate and track the pose of an object despite the clutter in backgrounds. Difficulties in object perception rise with the degree of clutter. Background clutter is likely to lead to false measurements, and false measurements tend to result in inaccurate pose estimates. To tackle significant clutter in backgrounds, we present two multiple pose hypotheses frameworks: a particle filtering framework for tracking and a voting framework for pose estimation. Handling of object discontinuities during tracking, such as severe occlusions, disappearances, and blurring, presents another important challenge. In an ideal scenario, a tracked object is visible throughout the entirety of tracking. However, when an object happens to be occluded by other objects or disappears due to the motions of the object or the camera, difficulties ensue. Because the continuous tracking of an object is critical to robotic manipulation, we propose to devise a method to measure tracking quality and to re-initialize tracking as necessary. The final challenge we address is performing these tasks within real-time constraints. Our particle filtering and voting frameworks, while time-consuming, are composed of repetitive, simple and independent computations. Inspired by that observation, we propose to run massively parallelized frameworks on a GPU for those robotic perception tasks which must operate within strict time constraints. Computer vision Robotic perception Visual tracking Object recognition Pose estimation Particle filtering Voting process RGB-D camera Monocular Geometric feature Photometric feature Unstructured environments GPU Real-time
48	Visual Tracking of Deformation and Classification of Object Elasticity with Robotic Hand Probing Hui, Fei January 2017 (has links) Performing tasks with a robotic hand often requires a complete knowledge of the manipulated object, including its properties (shape, rigidity, surface texture) and its location in the environment, in order to ensure safe and efficient manipulation. While well-established procedures exist for the manipulation of rigid objects, as well as several approaches for the manipulation of linear or planar deformable objects such as ropes or fabric, research addressing the characterization of deformable objects occupying a volume remains relatively limited. The fundamental objectives of this research are to track the deformation of non-rigid objects under robotic hand manipulation using RGB-D data, and to automatically classify deformable objects as either rigid, elastic, plastic, or elasto-plastic, based on the material they are made of, and to support recognition of the category of such objects through a robotic probing process in order to enhance manipulation capabilities. The goal is not to attempt to formally model the material of the object, but rather employ a data-driven approach to make decisions based on the observed properties of the object, capture implicitly its deformation behavior, and support adaptive control of a robotic hand for other research in the future. The proposed approach advantageously combines color image and point cloud processing techniques, and proposes a novel combination of the fast level set method with a log-polar mapping of the visual data to robustly detect and track the contour of a deformable object in a RGB-D data stream. Dynamic time warping is employed to characterize the object properties independently from the varying length of the detected contour as the object deforms. The research results demonstrate that a recognition rate over all categories of material of up to 98.3% is achieved based on the detected contour. When integrated in the control loop of a robotic hand, it can contribute to ensure stable grasp, and safe manipulation capability that will preserve the physical integrity of the object. Fast level set method Deformable objects RGB-D imaging Log-polar transform Contour tracking Elasticity classification Point cloud clustering Automatic color component selection
49	Comparing Structure from Motion Photogrammetry and Computer Vision for Low-Cost 3D Cave Mapping: Tipton-Haynes Cave, Tennessee Elmore, Clinton 01 August 2019 (has links) Natural caves represent one of the most difficult environments to map with modern 3D technologies. In this study I tested two relatively new methods for 3D mapping in Tipton-Haynes Cave near Johnson City, Tennessee: Structure from Motion Photogrammetry and Computer Vision using Tango, an RGB-D (Red Green Blue and Depth) technology. Many different aspects of these two methods were analyzed with respect to the needs of average cave explorers. Major considerations were cost, time, accuracy, durability, simplicity, lighting setup, and drift. The 3D maps were compared to a conventional cave map drafted with measurements from a modern digital survey instrument called the DistoX2, a clinometer, and a measuring tape. Both 3D mapping methods worked, but photogrammetry proved to be too time consuming and laborious for capturing more than a few meters of passage. RGB-D was faster, more accurate, and showed promise for the future of low-cost 3D cave mapping. Speleology Caves Cave Mapping 3D Mapping Google Tango RGB-D Photogrammetry Structure from Motion Paleontology DistoX Cave Survey Geology Other Earth Sciences
50	Semantic UFOMap : Semantic Information in Octree Occupancy Maps / Semantic UFOMap : Semantisk Information för Octree Robotkartor von Platen, Edvin January 2021 (has links) Many autonomous robots operating in unknown and unstructured environments rely on building a dense 3D map of it during exploration. What tasks the robot can perform depends on the information stored in this map. Most 3D maps currently in use store information required for robot control and environment reconstruction – is this point in space occupied, or safe to navigate to? To enable more complex tasks additional information is required. We introduce Semantic UFOMap, an open-source octree based mapping framework designed for online use on limited hardware. Capable of real-time fusion and querying of semantic instances into the map – enabling high-level robot tasks and human-robot interaction. The online capabilities are evaluated using ground-truth data, where we show competitive results compared to voxel hashing, with optimizations still available. Additionally, we demonstrate a potential application with a simulated autonomous exploration and object navigation experiment. The evaluation shows that Semantic UFOMap is capable of real-time online performance. Storing semantic information in the map has the potential to open up new autonomous robot applications and yield improvements in existing tasks. / Autonoma robotar som opererar i okända och ostrukturerade mijöer är ofta beroende av att skapa en 3D-karta under utforskning av området. Villka uppgifter roboten kan utföra beror på informationen som finns tillgänglig i kartan. De flesta nuvarande kartor som används sparar information som behövs för säker navigation och miljörekonstruktion – är den här positionen ett hinder, eller är den säker att navigera till? För att möjligjöra mer komplexa uppgifter behöver roboten ha tillgång till ytterligare information. Vi presenterar Semantic UFOMap, ett öppen källkods kartläggnings ramverk för realtids användning på begränsad hårdvara. Genom att klara av realtids integrering och sökning av semantiska instanser i kartan möjliggör ramverket mer komplexa uppgifter och öppnar upp fler användningsområden i människa-robot interaktion. Utvärdering görs med hjälp av inspelad data, vi visar konkurrenskraftiga resultat jämfört med voxel hashning, med optimering fortfarande tillgänglig. Ett användningsområde demonstreras med ett simulerat autonomt utforsknings och objektnavigerings experiment. Utvärderingen visar att Semantic UFOMap klarar av realtids applikationer. Att spara semantisk information i kartan har potential att öppna upp för nya användningsområden inom robotik och leda till förbättringar i befintliga uppgifter. Mapping Octrees RGB-D perception Motion and path planning Semantics Robots. Kartläggning Octree Färg- och djupperception Rörelse- och vägplannering Semantik Robotar. Computer Sciences Datavetenskap (datalogi)

Search results