Global ETD Search

21	La réalité augmentée : fusion de vision et navigation Zarrouati-Vissière, Nadège 20 December 2013 (has links) (PDF) Cette thèse a pour objet l'étude d'algorithmes pour des applications de réalité visuellement augmentée. Plusieurs besoins existent pour de telles applications, qui sont traités en tenant compte de la contrainte d'indistinguabilité de la profondeur et du mouvement linéaire dans le cas de l'utilisation de systèmes monoculaires. Pour insérer en temps réel de manière réaliste des objets virtuels dans des images acquises dans un environnement arbitraire et inconnu, il est non seulement nécessaire d'avoir une perception 3D de cet environnement à chaque instant, mais également d'y localiser précisément la caméra. Pour le premier besoin, on fait l'hypothèse d'une dynamique de la caméra connue, pour le second on suppose que la profondeur est donnée en entrée: ces deux hypothèses sont réalisables en pratique. Les deux problèmes sont posés dans lecontexte d'un modèle de caméra sphérique, ce qui permet d'obtenir des équations de mouvement invariantes par rotation pour l'intensité lumineuse comme pour la profondeur. L'observabilité théorique de ces problèmes est étudiée à l'aide d'outils de géométrie différentielle sur la sphère unité Riemanienne. Une implémentation pratique est présentée: les résultats expérimentauxmontrent qu'il est possible de localiser une caméra dans un environnement inconnu tout en cartographiant précisément cet environnement. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Observateur asymptotique Realité augmentée Systèmes de navigation Données RGBD Algorithmes SLAM
22	Vision-driven assembly robot Pozo León, Antonio, von Knoop Ruiz, Alexander January 2022 (has links) This project focuses on the integration of Computer Vision (CV) and robotics to automate object assembly tasks using a collaborative robot. By combining both disciplines, the project aims to improve productivity and safety in multiple environments by enabling safe and efficient assembly operations. To this end, an RGBD camera will be used as a vision system to capture both color and depth data from the environment. A further aspect to highlight is the importance of education and training for operators to enhance accessibility and familiarity with manipulator robots beyond the robotics industry and for the creation of adapted paths without knowledge of conventional robot path programming. The objectives cover conducting research, developing simulation environments, implementing algorithms, and their application in both real and virtual robots. This project is intended to enhance automation and efficiency while promoting the advantages of computer vision and robotics in industrial applications.
23	Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation Dickens, James 01 September 2021 (has links) The rise of convolutional neural networks (CNNs) in the context of computer vision has occurred in tandem with the advancement of depth sensing technology. Depth cameras are capable of yielding two-dimensional arrays storing at each pixel the distance from objects and surfaces in a scene from a given sensor, aligned with a regular color image, obtaining so-called RGBD images. Inspired by prior models in the literature, this work develops a suite of RGBD CNN models to tackle the challenging tasks of object detection, instance segmentation, and semantic segmentation. Prominent architectures for object detection and image segmentation are modified to incorporate dual backbone approaches inputting RGB and depth images, combining features from both modalities through the use of novel fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection approach achieves 53.5% mAP on the SUN RGBD 19-class object detection benchmark, while the proposed RGBD semantic segmentation architecture yields 69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4% mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and SUN RGBD datasets. These benchmarks offer researchers a baseline for the task of RGBD panoptic segmentation on these datasets, where the novel depth-aware model outperforms a comparable RGB counterpart. Deep learning Computer vision CNN Object detection Semantic segmentation Instance segmentation Multi-modal deep learning Panoptic segmentation Artificial intelligence Convolutional neural networks Neural networks RGBD Depth images
24	Improving deep monocular depth predictions using dense narrow field of view depth images Möckelind, Christoffer January 2018 (has links) In this work we study a depth prediction problem where we provide a narrow field of view depth image and a wide field of view RGB image to a deep network tasked with predicting the depth for the entire RGB image. We show that by providing a narrow field of view depth image, we improve results for the area outside the provided depth compared to an earlier approach only utilizing a single RGB image for depth prediction. We also show that larger depth maps provide a greater advantage than smaller ones and that the accuracy of the model decreases with the distance from the provided depth. Further, we investigate several architectures as well as study the effect of adding noise and lowering the resolution of the provided depth image. Our results show that models provided low resolution noisy data performs on par with the models provided unaltered depth. / I det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet. Deep learning Monocular Depth estimation Narrow field of view RGB RGBD Noicy depth Dense depth Narrow depth Sparse depth Computer Sciences Datavetenskap (datalogi)
25	Instance segmentation using 2.5D data Öhrling, Jonathan January 2023 (has links) Multi-modality fusion is an area of research that has shown promising results in the domain of 2D and 3D object detection. However, multi-modality fusion methods have largely not been utilized in the domain of instance segmentation. This master’s thesis investigated if multi-modality fusion methods can be applied to deep learning instance segmentation models to improve their performance on multi-modality data. The two multi-modality fusion methods presented, input extension and feature fusions, were applied to a two-stage instance segmentation model, Mask R-CNN, and a single-stage instance segmentation model, RTMDet. Models were trained on different variations of preprocessed RGBD and ToF data provided by SICK IVP, as well as RGBD data from the publicly available NYUDepth dataset. The master’s thesis concludes that the multi-modality fusion method presented as feature fusion can be applied to the Mask R-CNN model to improve the networks performance by 1.8%points (1.8%pt.) bounding box mAP and 1.6%pt. segmentation mAP on SICK RGBD, 7.7%pt. bounding box mAP and 7.4%pt. segmentation mAP on ToF, and 7.4%pt. bounding box mAP and 7.4%pt. segmentation mAP on NYUDepth. The RTMDet model saw little to no improvements from the inclusion of depth but had similar baseline performance as the improved Mask R-CNN model that utilized feature fusion. The input extension method saw no improvements to performance as it faced technical implementation limitations. instance segmentation multi-modality segmentation multi-modality fusion CNN RGBD ToF Mask R-CNN RTMDet MMDetection COCO NYUDepth Computer Engineering Datorteknik
26	3D real time object recognition Amplianitis, Konstantinos 01 March 2017 (has links) Die Objekterkennung ist ein natürlicher Prozess im Menschlichen Gehirn. Sie ndet im visuellen Kortex statt und nutzt die binokulare Eigenschaft der Augen, die eine drei- dimensionale Interpretation von Objekten in einer Szene erlaubt. Kameras ahmen das menschliche Auge nach. Bilder von zwei Kameras, in einem Stereokamerasystem, werden von Algorithmen für eine automatische, dreidimensionale Interpretation von Objekten in einer Szene benutzt. Die Entwicklung von Hard- und Software verbessern den maschinellen Prozess der Objek- terkennung und erreicht qualitativ immer mehr die Fähigkeiten des menschlichen Gehirns. Das Hauptziel dieses Forschungsfeldes ist die Entwicklung von robusten Algorithmen für die Szeneninterpretation. Sehr viel Aufwand wurde in den letzten Jahren in der zweidimen- sionale Objekterkennung betrieben, im Gegensatz zur Forschung zur dreidimensionalen Erkennung. Im Rahmen dieser Arbeit soll demnach die dreidimensionale Objekterkennung weiterent- wickelt werden: hin zu einer besseren Interpretation und einem besseren Verstehen von sichtbarer Realität wie auch der Beziehung zwischen Objekten in einer Szene. In den letzten Jahren aufkommende low-cost Verbrauchersensoren, wie die Microsoft Kinect, generieren Farb- und Tiefendaten einer Szene, um menschenähnliche visuelle Daten zu generieren. Das Ziel hier ist zu zeigen, wie diese Daten benutzt werden können, um eine neue Klasse von dreidimensionalen Objekterkennungsalgorithmen zu entwickeln - analog zur Verarbeitung im menschlichen Gehirn. / Object recognition is a natural process of the human brain performed in the visual cor- tex and relies on a binocular depth perception system that renders a three-dimensional representation of the objects in a scene. Hitherto, computer and software systems are been used to simulate the perception of three-dimensional environments with the aid of sensors to capture real-time images. In the process, such images are used as input data for further analysis and development of algorithms, an essential ingredient for simulating the complexity of human vision, so as to achieve scene interpretation for object recognition, similar to the way the human brain perceives it. The rapid pace of technological advancements in hardware and software, are continuously bringing the machine-based process for object recognition nearer to the inhuman vision prototype. The key in this eld, is the development of algorithms in order to achieve robust scene interpretation. A lot of recognisable and signi cant e ort has been successfully carried out over the years in 2D object recognition, as opposed to 3D. It is therefore, within this context and scope of this dissertation, to contribute towards the enhancement of 3D object recognition; a better interpretation and understanding of reality and the relationship between objects in a scene. Through the use and application of low-cost commodity sensors, such as Microsoft Kinect, RGB and depth data of a scene have been retrieved and manipulated in order to generate human-like visual perception data. The goal herein is to show how RGB and depth information can be utilised in order to develop a new class of 3D object recognition algorithms, analogous to the perception processed by the human brain. 3D Objekt Erkennung 3D Mensch Segmentierung Objekt Erkennung Conditional Random Fields Kinect-Sensor RGBD-Daten 3D Rekonstruktionen Bundle Adjustment ICP Registrierung 3D Object Recognition 3D Human Segmentation Object Detection Conditional Random Fields Kinect Sensor RGBD Data 3D Reconstructions Bundle Adjustment ICP registration 004 Informatik 28 Informatik, Datenverarbeitung ST 330 ddc:004
27	Occlusion Management in Conventional and Head-Mounted Display Visualization through the Relaxation of the Single Viewpoint/Timepoint Constraint Meng-Lin Wu (6916283) 16 August 2019 (has links) <div>In conventional computer graphics and visualization, images are synthesized following the planar pinhole camera (PPC) model. The PPC approximates physical imaging devices such as cameras and the human eye, which sample the scene with linear rays that originate from a single viewpoint, i.e. the pinhole. In addition, the PPC takes a snapshot of the scene, sampling it at a single instant in time, or timepoint, for each image. Images synthesized with these single viewpoint and single timepoint constraints are familiar to the user, as they emulate images captured with cameras or perceived by the human visual system. However, visualization using the PPC model suffers from the limitation of occlusion, when a region of interest (ROI) is not visible due to obstruction by other data. The conventional solution to the occlusion problem is to rely on the user to change the view interactively to gain line of sight to the scene ROIs. This approach of sequential navigation has the shortcomings of (1) inefficiency, as navigation is wasted when circumventing an occluder does not reveal an ROI, (2) inefficacy, as a moving or a transient ROI can hide or disappear before the user reaches it, or as scene understanding requires visualizing multiple distant ROIs in parallel, and (3) user confusion, as back-and-forth navigation for systematic scene exploration can hinder spatio-temporal awareness.</div><div><br></div><div>In this thesis we propose a novel paradigm for handling occlusions in visualization based on generalizing an image to incorporate samples from multiple viewpoints and multiple timepoints. The image generalization is implemented at camera model level, by removing the same timepoint restriction, and by removing the linear ray restriction, allowing for curved rays that are routed around occluders to reach distant ROIs. The paradigm offers the opportunity to greatly increase the information bandwidth of images, which we have explored in the context of both desktop and head-mounted display visualization, as needed in virtual and augmented reality applications. The challenges of multi-viewpoint multi-timepoint visualization are (1) routing the non-linear rays to find all ROIs or to reach all known ROIs, (2) making the generalized image easy to parse by enforcing spatial and temporal continuity and non-redundancy, (3) rendering the generalized images quickly as required by interactive applications, and (4) developing algorithms and user interfaces for the intuitive navigation of the compound cameras with tens of degrees of freedom. We have addressed these challenges (1) by developing a multiperspective visualization framework based on a hierarchical camera model with PPC and non-PPC leafs, (2) by routing multiple inflection point rays with direction coherence, which enforces visualization continuity, and without intersection, which enforces non-redundancy, (3) by designing our hierarchical camera model to provide closed-form projection, which enables porting generalized image rendering to the traditional and highly-efficient projection followed by rasterization pipeline implemented by graphics hardware, and (4) by devising naturalistic user interfaces based on tracked head-mounted displays that allow deploying and retracting the additional perspectives intuitively and without simulator sickness.</div> Applied Computer Science Occlusion management Camera models Multiperspective visualization Interactive visualization Focus+context Augmented reality Virtual reality Navigation Depth cues Visualization techniques Rendering Reconstruction Image-based rendering RGBD video Impostor
28	A three-dimensional representation method for noisy point clouds based on growing self-organizing maps accelerated on GPUs Orts-Escolano, Sergio 21 January 2014 (has links) The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction. 3D representation method Growing neural gas Self-organizing maps Topology preservation Parallel computing CUDA Real-time Point cloud 3D reconstruction GPGPU RGBD Noisy 3D data Object recognition
29	Incorporating Scene Depth in Discriminative Correlation Filters for Visual Tracking Stynsberg, John January 2018 (has links) Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous vehicles and robot-vision. Since visual tracking does not assume any prior knowledge about the target, it faces different challenges such occlusion, appearance change, background clutter and scale change. In this thesis we try to improve the capabilities of tracking frameworks using discriminative correlation filters by incorporating scene depth information. We utilize scene depth information on three main levels. First, we use raw depth information to segment the target from its surroundings enabling occlusion detection and scale estimation. Second, we investigate different visual features calculated from depth data to decide which features are good at encoding geometric information available solely in depth data. Third, we investigate handling missing data in the depth maps using a modified version of the normalized convolution framework. Finally, we introduce a novel approach for parameter search using genetic algorithms to find the best hyperparameters for our tracking framework. Experiments show that depth data can be used to estimate scale changes and handle occlusions. In addition, visual features calculated from depth are more representative if they were combined with color features. It is also shown that utilizing normalized convolution improves the overall performance in some cases. Lastly, the usage of genetic algorithms for hyperparameter search leads to accuracy gains as well as some insights on the performance of different components within the framework. Tracking Visual Deep Learning Machine Learning CNN Convolutional Neural Network Unsupervised Learning Clustering Genetic Algorithms Features Visual featues Channel Coding RGBD Scene Depth Map Kinect Discriminative Correlation Filters SRDCF DCF Spatial Spatially Regularized Hyperparameter Search Occlusion Detection Handling Kalman Filters Normalized Convolution Bayesian Gaussian Mixture Scale Estimation Conjugate Gradient Linkoping Sweden Visuell Följning Särdrag Djupa Faltningsnätverk Maskininlärning Djup Inlärning Genetiska Algoritmer Klustring Djup RGBD Linköping Sverige
30	3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING Ashley S Dale (8771429) 07 January 2021 (has links) <div> <div> <div> <p>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ <sub>F1 </sub>= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background. </p></div></div></div> Computer Engineering Software Engineering Applied Computer Science Information Systems Signal Processing Computer Graphics Computer Vision Image Processing Pattern Recognition and Data Mining Simulation and Modelling Virtual Reality and Related Simulation Applied Discrete Mathematics Coding and Information Theory Conceptual Modelling Information Engineering and Theory Machine Learning Mask R-CNN Artificial Intelligence Image Processing 3D Image Multispectral Data Signal Processing CNN DNN Object Detection Threat Detection Virtual Environments Synthetic Dataset F1 score image segmentation methods Image Segmentation RGBD RGBD video RGBZ Algorithms MS COCO Microsoft Common Objects in Context Transfer Learning

Search results