1 |
3D Shape Detection for Augmented Reality / 3D form-detektion för förstärkt verklighetAnadon Leon, Hector January 2018 (has links)
In previous work, 2D object recognition has shown exceptional results. However, it is not possible to sense the environment spatial information, where the objects are and what they are. Having this knowledge could imply improvements in several fields like Augmented Reality by allowing virtual characters to interact more realistically with the environment and Autonomous cars by being able to make better decisions knowing where the objects are in a 3D space. The proposed work shows that it is possible to predict 3D bounding boxes with semantic labels for 3D object detection and a set of primitives for 3D shape recognition from multiple objects in a indoors scene using an algorithm that receives as input an RGB image and its 3D information. It uses Deep Neural Networks with novel architectures for point cloud feature extraction. It uses a unique feature vector capable of representing the latent space of the object that models its shape, position, size and orientation for multi-task prediction trained end-to-end with unbalanced datasets. It runs in real time (5 frames per second) in a live video feed. The method is evaluated in the NYU Depth Dataset V2 using Average Precision for object detection and 3D Intersection over Union and surface-to-surface distance for 3D shape. The results confirm that it is possible to use a shared feature vector for more than one prediction task and it generalizes for unseen objects during the training process achieving state-of-the-art results for 3D object detection and 3D shape prediction for the NYU Depth Dataset V2. Qualitative results are shown in real particular captured data showing that there could be navigation in a real-world indoor environment and that there could be collisions between the animations and the detected objects improving the interaction character-environment in Augmented Reality applications. / 2D-objektigenkänning har i tidigare arbeten uppvisat exceptionella resultat. Dessa modeller gör det dock inte möjligt att erhålla rumsinformation, så som föremåls position och information om vad föremålen är. Sådan kunskap kan leda till förbättringar inom flera områden så som förstärkt verklighet, så att virtuella karaktärer mer realistiskt kan interagera med miljön, samt för självstyrande bilar, så att de kan fatta bättre beslut och veta var objekt är i ett 3D-utrymme. Detta arbete visar att det är möjligt att modellera täckande rätblock med semantiska etiketter för 3D-objektdetektering, samt underliggande komponenter för 3D-formigenkänning, från flera objekt i en inomhusmiljö med en algoritm som verkar på en RGB-bild och dess 3D-information. Modellen konstrueras med djupa neurala nätverk med nya arkitekturer för Point Cloud-representationsextraktion. Den använder en unik representationsvektor som kan representera det latenta utrymmet i objektet som modellerar dess form, position, storlek och orientering för komplett träning med flera uppgifter, med obalanserade dataset. Den körs i realtid (5 bilder per sekund) i realtidsvideo. Metoden utvärderas med NYU Depth Dataset V2 med Genomsnittlig Precision för objektdetektering, 3D-Skärning över Union, samt avstånd mellan ytorna för 3D-form. Resultaten bekräftar att det är möjligt att använda en delad representationsvektor för mer än en prediktionsuppgift, och generaliserar för föremål som inte observerats under träningsprocessen. Den uppnår toppresultat för 3D-objektdetektering samt 3D-form-prediktion för NYU Depth Dataset V2. Kvalitativa resultat baserade på särskilt anskaffade data visar potential inom navigering i en verklig inomhusmiljö, samt kollision mellan animationer och detekterade objekt, vilka kan förbättra interaktonen mellan karaktär och miljö inom förstärkt verklighet-applikationer.
|
2 |
Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D CameraLai, Po Kong 26 September 2019 (has links)
In this thesis we explore the concepts and components which can be used as individual building blocks for producing immersive virtual reality (VR) content from a single RGB-D sensor. We identify the properties of immersive VR videos and propose a system composed of a foreground/background separator, a dynamic scene re-constructor and a shape completer.
We initially explore the foreground/background separator component in the context of video summarization. More specifically, we examined how to extract trajectories of moving objects from video sequences captured with a static camera. We then present a new approach for video summarization via minimization of the spatial-temporal projections of the extracted object trajectories. New evaluation criterion are also presented for video summarization. These concepts of foreground/background separation can then be applied towards VR scene creation by extracting relative objects of interest.
We present an approach for the dynamic scene re-constructor component using a single moving RGB-D sensor. By tracking the foreground objects and removing them from the input RGB-D frames we can feed the background only data into existing RGB-D SLAM systems. The result is a static 3D background model where the foreground frames are then super-imposed to produce a coherent scene with dynamic moving foreground objects. We also present a specific method for extracting moving foreground objects from a moving RGB-D camera along with an evaluation dataset with benchmarks.
Lastly, the shape completer component takes in a single view depth map of an object as input and "fills in" the occluded portions to produce a complete 3D shape. We present an approach that utilizes a new data minimal representation, the additive depth map, which allows traditional 2D convolutional neural networks to accomplish the task. The additive depth map represents the amount of depth required to transform the input into the "back depth map" which would exist if there was a sensor exactly opposite of the input. We train and benchmark our approach using existing synthetic datasets and also show that it can perform shape completion on real world data without fine-tuning. Our experiments show that our data minimal representation can achieve comparable results to existing state-of-the-art 3D networks while also being able to produce higher resolution outputs.
|
Page generated in 0.2056 seconds