Spelling suggestions: "subject:"monocular depth estimation"" "subject:"nonocular depth estimation""
1 |
Self-supervised monocular image depth learning and confidence estimationChen, L., Tang, W., Wan, Tao Ruan, John, N.W. 17 June 2020 (has links)
No / We present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
|
2 |
Depth Estimation Using Adaptive Bins via Global Attention at High ResolutionBhat, Shariq 21 April 2021 (has links)
We address the problem of estimating a high quality dense depth map from a
single RGB input image. We start out with a baseline encoder-decoder convolutional
neural network architecture and pose the question of how the global processing of
information can help improve overall depth estimation. To this end, we propose a
transformer-based architecture block that divides the depth range into bins whose
center value is estimated adaptively per image. The final depth values are estimated
as linear combinations of the bin centers. We call our new building block AdaBins.
Our results show a decisive improvement over the state-of-the-art on several popular
depth datasets across all metrics. We also validate the effectiveness of the proposed
block with an ablation study.
|
3 |
Semantic Segmentation For Free Drive-able Space EstimationGallagher, Eric 02 October 2020 (has links)
Autonomous Vehicles need precise information as to the Drive-able space in order to be able to safely navigate. In recent years deep learning and Semantic Segmentation have attracted intense research. It is a highly advancing and rapidly
evolving field that continues to provide excellent results. Research has shown that deep learning is emerging as a powerful tool in many applications. The aim of this study is to develop a deep learning system to estimate the Free Drive-able space.
Building on the state of the art deep learning techniques, semantic segmentation will be used to replace the need for highly accurate maps, that are expensive to license. Free Drive-able space is defined as the drive-able space on the correct side
of the road, that can be reached without a collision with another road user or pedestrian. A state of the art deep network will be trained with a custom data-set in order to learn complex driving decisions. Motivated by good results, further deep learning techniques will be applied to measure distance from monocular images. The findings demonstrate the power of deep learning techniques in complex driving decisions. The results also indicate the economic and technical feasibility of semantic segmentation over expensive high definition maps.
|
4 |
Monocular Depth Estimation: Datasets, Methods, and ApplicationsBauer, Zuria 15 September 2021 (has links)
The World Health Organization (WHO) stated in February 2021 at the Seventy- Third World Health Assembly that, globally, at least 2.2 billion people have a near or distance vision impairment. They also denoted the severe impact vision impairment has on the quality of life of the individual suffering from this condition, how it affects the social well-being and their economic independence in society, becoming in some cases an additional burden to also people in their immediate surroundings. In order to minimize the costs and intrusiveness of the applications and maximize the autonomy of the individual life, the natural solution is using systems that rely on computer vision algorithms. The systems improving the quality of life of the visually impaired need to solve different problems such as: localization, path recognition, obstacle detection, environment description, navigation, etc. Each of these topics involves an additional set of problems that have to be solved to address it. For example, for the task of object detection, there is the need of depth prediction to know the distance to the object, path recognition to know if the user is on the road or on a pedestrian path, alarm system to provide notifications of danger for the user, trajectory prediction of the approaching obstacle, and those are only the main key points. Taking a closer look at all of these topics, they have one key component in common: depth estimation/prediction. All of these topics are in need of a correct estimation of the depth in the scenario. In this thesis, our main focus relies on addressing depth estimation in indoor and outdoor environments. Traditional depth estimation methods, like structure from motion and stereo matching, are built on feature correspondences from multiple viewpoints. Despite the effectiveness of these approaches, they need a specific type of data for their proper performance. Since our main goal is to provide systems with minimal costs and intrusiveness that are also easy to handle we decided to infer the depth from single images: monocular depth estimation. Estimating depth of a scene from a single image is a simple task for humans, but it is notoriously more difficult for computational models to be able to achieve high accuracy and low resource requirements. Monocular Depth Estimation is this very task of estimating depth from a single RGB image. Since there is only a need of one image, this approach is used in applications such as autonomous driving, scene understanding or 3D modeling where other type of information is not available. This thesis presents contributions towards solving this task using deep learning as the main tool. The four main contributions of this thesis are: first, we carry out an extensive review of the state-of-the-art in monocular depth estimation; secondly, we introduce a novel large scale high resolution outdoor stereo dataset able to provide enough image information to solve various common computer vision problems; thirdly, we show a set of architectures able to predict monocular depth effectively; and, at last, we propose two real life applications of those architectures, addressing the topic of enhancing the perception for the visually impaired using low-cost wearable sensors.
|
5 |
Monocular Depth Estimation with Edge-Based Constraints and Active LearningJanuary 2019 (has links)
abstract: The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall.
This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019
|
6 |
Monocular Depth Estimation with Edge-Based Constraints using Active Learning OptimizationSaleh, Shadi 04 April 2024 (has links)
Depth sensing is pivotal in robotics; however, monocular depth estimation encounters significant challenges. Existing algorithms relying on large-scale labeled data and large Deep Convolutional Neural Networks (DCNNs) hinder real-world applications. We propose two lightweight architectures that achieve commendable accuracy rates of 91.2% and 90.1%, simultaneously reducing the Root Mean Square Error (RMSE) of depth to 4.815 and 5.036. Our lightweight depth model operates at 29-44 FPS on the Jetson Nano GPU, showcasing efficient performance with minimal power consumption.
Moreover, we introduce a mask network designed to visualize and analyze the compact depth network, aiding in discerning informative samples for the active learning approach. This contributes to increased model accuracy and enhanced generalization capabilities.
Furthermore, our methodology encompasses the introduction of an active learning framework strategically designed to enhance model performance and accuracy by efficiently utilizing limited labeled training data. This novel framework outperforms previous studies by achieving commendable results with only 18.3% utilization of the KITTI Odometry dataset. This performance reflects a skillful balance between computational efficiency and accuracy, tailored for low-cost devices while reducing data training requirements.:1. Introduction
2. Literature Review
3. AI Technologies for Edge Computing
4. Monocular Depth Estimation Methodology
5. Implementation
6. Result and Evaluation
7. Conclusion and Future Scope
Appendix
|
Page generated in 0.136 seconds