Global ETD Search

1	Towards Scalable Deep 3D Perception and Generation Qian, Guocheng 11 October 2023 (has links) Scaling up 3D deep learning systems emerges as a paramount issue, comprising two primary facets: (1) Model scalability that designs a 3D network that is scalefriendly, i.e. model archives improving performance with increasing parameters and can run efficiently. Unlike 2D convolutional networks, 3D networks have to accommodate the irregularities of 3D data, such as respecting permutation invariance in point clouds. (2) Data scalability: high-quality 3D data is conspicuously scarce in the 3D field. 3D data acquisition and annotations are both complex and costly, hampering the development of scalable 3D deep learning. This dissertation delves into 3D deep learning including both perception and generation, addressing the scalability challenges. To address model scalability in 3D perception, I introduce ASSANet which outlines an approach for efficient 3D point cloud representation learning, allowing the model to scale up with a low cost of computation, and notably achieving substantial accuracy gains. I further introduce the PointNeXt framework, focusing on data augmentation and scalability of the architecture, that outperforms state-of-the-art 3D point cloud perception networks. To address data scalability, I present Pix4Point which explores the utilization of abundant 2D images to enhance 3D understanding. For scalable 3D generation, I propose Magic123 which leverages a joint 2D and 3D diffusion prior for zero-shot image-to-3D content generation without the necessity of 3D supervision. These collective efforts provide pivotal solutions to model and data scalability in 3D deep learning. 3D Deep Learning 3D Understanding 3D Generation Point Cloud
2	Indoor 3D Scene Understanding Using Depth Sensors Lahoud, Jean 09 1900 (has links) One of the main goals in computer vision is to achieve a human-like understanding of images. Nevertheless, image understanding has been mainly studied in the 2D image frame, so more information is needed to relate them to the 3D world. With the emergence of 3D sensors (e.g. the Microsoft Kinect), which provide depth along with color information, the task of propagating 2D knowledge into 3D becomes more attainable and enables interaction between a machine (e.g. robot) and its environment. This dissertation focuses on three aspects of indoor 3D scene understanding: (1) 2D-driven 3D object detection for single frame scenes with inherent 2D information, (2) 3D object instance segmentation for 3D reconstructed scenes, and (3) using room and floor orientation for automatic labeling of indoor scenes that could be used for self-supervised object segmentation. These methods allow capturing of physical extents of 3D objects, such as their sizes and actual locations within a scene. Depth sensors 3D Understanding 3D instance segmentation 3D object detection self-supervised pertaining object recognition
3	Towards Designing Robust Deep Learning Models for 3D Understanding Hamdi, Abdullah 04 1900 (has links) This dissertation presents novel methods for addressing important challenges related to the robustness of Deep Neural Networks (DNNs) for 3D understanding and in 3D setups. Our research focuses on two main areas, adversarial robustness on 3D data and setups and the robustness of DNNs to realistic 3D scenarios. One paradigm for 3D understanding is to represent 3D as a set of 3D points and learn functions on this set directly. Our first work, AdvPC, addresses the issue of limited transferability and ease of defense against current 3D point cloud adversarial attacks. By using a point cloud Auto-Encoder to generate more transferable attacks, AdvPC surpasses state-of-the-art attacks by a large margin on 3D point cloud attack transferability. Additionally, AdvPC increases the ability to break defenses by up to 38\% as compared to other baseline attacks on the ModelNet40 dataset. Another paradigm of 3D understanding is to perform 2D processing of multiple images of the 3D data. The second work, MVTN, addresses the problem of selecting viewpoints for 3D shape recognition using a Multi-View Transformation Network (MVTN) to learn optimal viewpoints. It combines MVTN with multi-view approaches leading to state-of-the-art results on standard benchmarks ModelNet40, ShapeNet Core55, and ScanObjectNN. MVTN also improves robustness to realistic scenarios like rotation and occlusion. Our third work analyzes the Semantic Robustness of 2D Deep Neural Networks, addressing the problem of high sensitivity toward semantic primitives in DNNs by visualizing the DNN global behavior as semantic maps and observing the interesting behavior of some DNNs. Additionally, we develop a bottom-up approach to detect robust regions of DNNs for scalable semantic robustness analysis and benchmarking of different DNNs. The fourth work, SADA, showcases the problem of lack of robustness in DNNs specifically for the safety-critical applications of autonomous navigation, beyond the simple classification setup. We present a general framework (BBGAN) for black-box adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task. BBGAN is trained to generate failure cases that consistently fool a trained agent on tasks such as object detection, self-driving, and autonomous UAV racing. computer vision 3D understanding robustness deep learning deep neural networks adversarial robustness

1

Page generated in 0.0689 seconds