Global ETD Search

21	Monocular Depth Estimation with Edge-Based Constraints and Active Learning January 2019 (has links) abstract: The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall. This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019 Computer science Artificial intelligence Artificial Intelligence Computer Vision Machine Learning Monocular Depth Estimation Robotics
22	Estimation of Defocus Blur in Virtual Environments Comparing Graph Cuts and Convolutional Neural Network Chowdhury, Prodipto 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Depth estimation is one of the most important problems in computer vision. It has attracted a lot of attention because it has applications in many areas, such as robotics, VR and AR, self-driving cars etc. Using the defocus blur of a camera lens is one of the methods of depth estimation. In this thesis, we have researched this technique in virtual environments. Virtual datasets have been created for this purpose. In this research, we have applied graph cuts and convolutional neural network (DfD-net) to estimate depth from defocus blur using a natural (Middlebury) and a virtual (Maya) dataset. Graph Cuts showed similar performance for both natural and virtual datasets in terms of NMAE and NRMSE. However, with regard to SSIM, the performance of graph cuts is 4% better for Middlebury compared to Maya. We have trained the DfD-net using the natural and the virtual dataset and then combining both datasets. The network trained by the virtual dataset performed best for both datasets. The performance of graph-cuts and DfD-net have been compared. Graph-Cuts performance is 7% better than DfD-Net in terms of SSIM for Middlebury images. For Maya images, DfD-Net outperforms Graph-Cuts by 2%. With regard to NRMSE, Graph-Cuts and DfD-net shows similar performance for Maya images. For Middlebury images, Graph-cuts is 1.8% better. The algorithms show no difference in performance in terms of NMAE. The time DfD-net takes to generate depth maps compared to graph cuts is 500 times less for Maya images and 200 times less for Middlebury images. 3D Graph cuts Depth estimation Deep learning Virtual environments Convolutional neural network Defocus blur
23	3-D Scene Reconstruction for Passive Ranging Using Depth from Defocus and Deep Learning Emerson, David R. 08 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Depth estimation is increasingly becoming more important in computer vision. The requirement for autonomous systems to gauge their surroundings is of the utmost importance in order to avoid obstacles, preventing damage to itself and/or other systems or people. Depth measuring/estimation systems that use multiple cameras from multiple views can be expensive and extremely complex. And as these autonomous systems decrease in size and available power, the supporting sensors required to estimate depth must also shrink in size and power consumption. This research will concentrate on a single passive method known as Depth from Defocus (DfD), which uses an in-focus and out-of-focus image to infer the depth of objects in a scene. The major contribution of this research is the introduction of a new Deep Learning (DL) architecture to process the the in-focus and out-of-focus images to produce a depth map for the scene improving both speed and performance over a range of lighting conditions. Compared to the previous state-of-the-art multi-label graph cuts algorithms applied to the synthetically blurred dataset the DfD-Net produced a 34.30% improvement in the average Normalized Root Mean Square Error (NRMSE). Similarly the DfD-Net architecture produced a 76.69% improvement in the average Normalized Mean Absolute Error (NMAE). Only the Structural Similarity Index (SSIM) had a small average decrease of 2.68% when compared to the graph cuts algorithm. This slight reduction in the SSIM value is a result of the SSIM metric penalizing images that appear to be noisy. In some instances the DfD-Net output is mottled, which is interpreted as noise by the SSIM metric. This research introduces two methods of deep learning architecture optimization. The first method employs the use of a variant of the Particle Swarm Optimization (PSO) algorithm to improve the performance of the DfD-Net architecture. The PSO algorithm was able to find a combination of the number of convolutional filters, the size of the filters, the activation layers used, the use of a batch normalization layer between filters and the size of the input image used during training to produce a network architecture that resulted in an average NRMSE that was approximately 6.25% better than the baseline DfD-Net average NRMSE. This optimized architecture also resulted in an average NMAE that was 5.25% better than the baseline DfD-Net average NMAE. Only the SSIM metric did not see a gain in performance, dropping by 0.26% when compared to the baseline DfD-Net average SSIM value. The second method illustrates the use of a Self Organizing Map clustering method to reduce the number convolutional filters in the DfD-Net to reduce the overall run time of the architecture while still retaining the network performance exhibited prior to the reduction. This method produces a reduced DfD-Net architecture that has a run time decrease of between 14.91% and 44.85% depending on the hardware architecture that is running the network. The final reduced DfD-Net resulted in a network architecture that had an overall decrease in the average NRMSE value of approximately 3.4% when compared to the baseline, unaltered DfD-Net, mean NRMSE value. The NMAE and the SSIM results for the reduced architecture were 0.65% and 0.13% below the baseline results respectively. This illustrates that reducing the network architecture complexity does not necessarily reduce the reduction in performance. Finally, this research introduced a new, real world dataset that was captured using a camera and a voltage controlled microfluidic lens to capture the visual data and a 2-D scanning LIDAR to capture the ground truth data. The visual data consists of images captured at seven different exposure times and 17 discrete voltage steps per exposure time. The objects in this dataset were divided into four repeating scene patterns in which the same surfaces were used. These scenes were located between 1.5 and 2.5 meters from the camera and LIDAR. This was done so any of the deep learning algorithms tested would see the same texture at multiple depths and multiple blurs. The DfD-Net architecture was employed in two separate tests using the real world dataset. The first test was the synthetic blurring of the real world dataset and assessing the performance of the DfD-Net trained on the Middlebury dataset. The results of the real world dataset for the scenes that were between 1.5 and 2.2 meters from the camera the DfD-Net trained on the Middlebury dataset produced an average NRMSE, NMAE and SSIM value that exceeded the test results of the DfD-Net tested on the Middlebury test set. The second test conducted was the training and testing solely on the real world dataset. Analysis of the camera and lens behavior led to an optimal lens voltage step configuration of 141 and 129. Using this configuration, training the DfD-Net resulted in an average NRMSE, NMAE and SSIM of 0.0660, 0.0517 and 0.8028 with a standard deviation of 0.0173, 0.0186 and 0.0641 respectively. Depth estimation Deep Learning Depth from Focus Microfluidic Lens LIDAR Particle Swarm Optimization (PSO)
24	Depth Estimation Methodology for Modern Digital Photography Sun, Yi 01 October 2019 (has links) No description available. Electrical Engineering Depth Estimation Digital Photography Digital Camera Depth-from-defocus Multiple Images
25	Light Field Video Processing and Streaming Using Applied AI Hu, Xinjue 16 November 2022 (has links) As a new form of volumetric media, a Light Field (LF) can provide users with a true 6 Degrees-Of-Freedom (DOF) immersive experience, because LF captures the scene with photo-realism, including aperture-limited changes in viewpoint. Nevertheless, the larger size and higher dimension of LF data bring greater challenges to processing and transmission. The main focus of this study is the application of the applied Artificial Intelligence (AI) method to the transmission and processing of LF data, thereby alleviating the performance bottleneck in existing methods. Uncompressed LF data are too large for network transmission, which is why LF compression has become an important research topic. A new LF compression algorithm based on Graph Neural Network (GNN) is proposed in this work. It can use the graph network model to fit the similarity between the LF viewpoints, so that only the data of a few essential anchor viewpoints need to be transmitted after compression, and a complete LF matrix can be reconstructed according to the graph model at the decoding end. This method also solves the problem of weak generalization of the LF reconstruction algorithm when dealing with high-frequency components through the design of two-layer compression structure. Compared with existing compression methods, a higher compression ratio and better quality can be achieved using this algorithm. Furthermore, to improve the adaptability of the real-time requirements of different LF applications and robustness requirements in unreliable network environments, an adaptive LF video transmission scheme based on Multiple Description Coding (MDC) is proposed. It can divide the LF matrix into LF descriptions at different levels of downsampling ratios, and optimize the scheduling of the descriptions transmission queue, which can ensure that it can adaptively adjust the design of basic GNN unit so that the proposed method can adapt more flexibly to the real-time changes of user viewpoint requests, so as to save unnecessary viewpoint transmission overhead to the greatest extent, and minimize the adverse impact of network packet loss and network status fluctuations on LF transmission services. For LF processing, depth estimation has been a very hot topic in recent years. To achieve a good balance between the performance of both narrow- or wide-baseline LF data, a novel optical-flow-based LF depth estimation scheme, which uses a convolutional neural network (CNN) to predict the patch matrix after optical flow offset, is proposed. After the optical-flow-assisted offset, the disparity between patches is processed to a unified numerical range, which can effectively solve the overfitting problem of the LF depth estimation network caused by the uneven distribution of the baseline range of LF samples. Experimental results show that the proposed uniform-patch-based estimation mechanism has good generalization on LF data of different baselines and is compatible with various existing narrow-baseline LF depth estimation algorithms. Finally, since LF processing places high requirements on both the computing and caching capabilities of the infrastructure, a framework that combines Multi-access Edge Computing (MEC) technology with LF applications is proposed in this thesis. In this study, the problem is transformed by the Lyapunov optimization, and an optimized search algorithm based on the Markov approximation method is designed, which can adaptively schedule and adjust the task offloading strategy and resource allocation scheme, so as to provide users with the best service experience in the LF viewpoint interpolation task. Numerical results demonstrate that this edge-based framework can achieve a dynamic balance between energy and caching consumption while meeting the low latency requirements of LF applications. Light Field Light Field Compression Light Field Streaming Depth Estimation Edge Computing
26	MonoDepth-vSLAM: A Visual EKF-SLAM using Optical Flow and Monocular Depth Estimation Dey, Rohit 04 October 2021 (has links) No description available. Robots Visual SLAM Optical Flow EKF Monocular Camera Depth Estimation Neural Network CARLA
27	Dataset and Evaluation of Self-Supervised Learning for Panoramic Depth Estimation Nett, Ryan 01 December 2020 (has links) (PDF) Depth detection is a very common computer vision problem. It shows up primarily in robotics, automation, or 3D visualization domains, as it is essential for converting images to point clouds. One of the poster child applications is self driving cars. Currently, the best methods for depth detection are either very expensive, like LIDAR, or require precise calibration, like stereo cameras. These costs have given rise to attempts to detect depth from a monocular camera (a single camera). While this is possible, it is harder than LIDAR or stereo methods since depth can't be measured from monocular images, it has to be inferred. A good example is covering one eye: you still have some idea how far away things are, but it's not exact. Neural networks are a natural fit for this. Here, we build on previous neural network methods by applying a recent state of the art model to panoramic images in addition to pinhole ones and performing a comparative evaluation. First, we create a simulated depth detection dataset that lends itself to panoramic comparisons and contains pre-made cylindrical and spherical panoramas. We then modify monodepth2 to support cylindrical and cubemap panoramas, incorporating current best practices for depth detection on those panorama types, and evaluate its performance for each type of image using our dataset. We also consider the resources used in training and other qualitative factors. Computer Vision Depth Estimation Panoramas Dataset Simulation Artificial Intelligence and Robotics
28	Real-Time GPU Scheduling with Preemption Support for Autonomous Mobile Robots Bharmal, Burhanuddin Asifhusain 18 January 2022 (has links) The use of graphical processing units (GPUs) for autonomous robots has grown recently due to their efficiency and suitability for data intensive computation. However, the current embedded GPU platforms may lack sufficient real-time capabilities for safety-critical autonomous systems. The GPU driver provides little to no control over the execution of the computational kernels and does not allow multiple kernels to execute concurrently for integrated GPUs. With the development of modern embedded platforms with integrated GPU, many embedded applications are accelerated using GPU. These applications are very computationally intensive, and they often have different criticality levels. In this thesis, we provide a software-based approach to schedule the real-world robotics application with two different scheduling policies: Fixed Priority FIFO Scheduling and Earliest Deadline First Scheduling. We implement several commonly used applications in autonomous mobile robots, such as Path Planning, Object Detection, and Depth Estimation, and improve the response time of these applications. We test our framework on NVIDIA AGX Xavier, which provides high computing power and supports eight different power modes. We measure the response times of all three applications with and without the scheduler on the NVIDIA AGX Xavier platform on different power modes, to evaluate the effectiveness of the scheduler. / Master of Science / Autonomous mobile robots for general human services have increased significantly due to ever-growing technology. The common applications of these robots include delivery services, search and rescue, hotel services, and so on. This thesis focuses on implementing the computational tasks performed by these robots as well as designing the task scheduler, to improve the overall performance of these tasks. The embedded hardware is resource-constrained with limited memory, power, and operating frequency. The use of a graphical processing unit (GPU) for executing the tasks to speed up the operation has increased with the development of the GPU programming framework. We propose a software-based GPU scheduler to execute the functions on GPU and get the best possible performance from the embedded hardware. RT-GPU Scheduling Limited Preemption Path Planning Object Detection Depth Estimation
29	Semi-supervised learning for joint visual odometry and depth estimation Papadopoulos, Kyriakos January 2024 (has links) Autonomous driving has seen huge interest and improvements in the last few years. Two important functions of autonomous driving is the depth and visual odometry estimation.Depth estimation refers to determining the distance from the camera to each point in the scene captured by the camera, while the visual odometry refers to estimation of ego motion using images recorded by the camera. The algorithm presented by Zhou et al. [1] is a completely unsupervised algorithm for depth and ego motion estimation. This thesis sets out to minimize ambiguity and enhance performance of the algorithm [1]. The purpose of the mentioned algorithm is to estimate the depth map given an image, from a camera attached to the agent, and the ego motion of the agent, in the case of the thesis, the agent is a vehicle. The algorithm lacks the ability to make predictions in the true scale in both depth and ego motion, said differently, it suffers from ambiguity. Two extensions of the method were developed by changing the loss function of the algorithm and supervising ego motion. Both methods show a remarkable improvement in their performance and reduced ambiguity, utilizing only the ego motion ground data which is significantly easier to access than depth ground truth data Depth Estimation Visual Odometry Ego Motion 3D Reconstruction Semi-Supervised Autonomous Driving Computer Sciences Datavetenskap (datalogi)
30	Monocular Depth Estimation Using Deep Convolutional Neural Networks Larsson, Susanna January 2019 (has links) For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction. Depth estimation depth maps monocular SLAM mono-SLAM pixelwise depth prediction encoder-decoder network Signal Processing Signalbehandling

Search results