• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 1
  • 1
  • Tagged with
  • 10
  • 10
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Panodepth – Panoramic Monocular Depth Perception Model and Framework

Wong, Adley K 01 December 2022 (has links) (PDF)
Depth perception has become a heavily researched area as companies and researchers are striving towards the development of self-driving cars. Self-driving cars rely on perceiving the surrounding area, which heavily depends on technology capable of providing the system with depth perception capabilities. In this paper, we explore developing a single camera (monocular) depth prediction model that is trained on panoramic depth images. Our model makes novel use of transfer learning efficient encoder models, pre-training on a larger dataset of flat depth images, and optimizing the model for use with a Jetson Nano. Additionally, we present a training and optimization framework to make developing and testing new monocular depth perception models easier and faster. While the model failed to achieve a high frame rate, the framework and models developed are a promising starting place for future work.
2

Self-supervised monocular image depth learning and confidence estimation

Chen, L., Tang, W., Wan, Tao Ruan, John, N.W. 17 June 2020 (has links)
No / We present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
3

Visual space attention in three-dimensional space

Tucker, Andrew James, n/a January 2006 (has links)
Current models of visual spatial attention are based on the extent to which attention can be allocated in 2-dimensional displays. The distribution of attention in 3-dimensional space has received little consideration. A series of experiments were devised to explore the apparent inconsistencies in the literature pertaining to the allocation of spatial attention in the third dimension. A review of the literature attributed these inconsistencies to differences and limitations in the various methodologies employed, in addition to the use of differing attentional paradigms. An initial aim of this thesis was to develop a highly controlled novel adaptation of the conventional robust covert orienting of visual attention task (COVAT) in depth defined by either binocular (stereoscopic) or monocular cues. The results indicated that attentional selection in the COVAT is not allocated within a 3-dimensional representation of space. Consequently, an alternative measure of spatial attention in depth, the overlay interference task, was successfully validated in a different stereoscopic depth environment and then manipulated to further examine the allocation of attention in depth. Findings from the overlay interference experiments indicated that attentional selection is based on a representation that includes depth information, but only when an additional feature can aid 3D selection. Collectively, the results suggest a dissociation between two paradigms that are both purported to be measures of spatial attention. There appears to be a further dissociation between 2-dimensional and 3-dimensional attentional selection in both paradigms for different reasons. These behavioural results, combined with recent electrophysiological evidence suggest that the temporal constraints of the 3D COVAT paradigm result in early selection based predominantly on retinotopic spatial coordinates prior to the complete construction of a 3-dimensional representation. Task requirements of the 3D overlay interference paradigm, on the other hand, while not being restricted by temporal constraints, demand that attentional selection occurs later, after the construction of a 3-dimensional representation, but only with the guidance of a secondary feature. Regardless of whether attentional selection occurs early or late, however, some component of selection appears to be based on viewer-centred spatial coordinates.
4

空間注意力經由深度影響模稜運動知覺 / The effect of spatial attention on multistable motion perception via the depth mechanism

孫華君, Sun, Hua Chun Unknown Date (has links)
Many studies have found that fixating or directing spatial attention to different regions can bias the perception of the Necker cube, but whether this effect of spatial attention is due to attended areas perceived as being closer have yet to be examined. This issue was directly investigated in this study. The stimulus used was the diamond stimulus, containing four occluders and four moving lines that can be perceived as coherent or separate motions. The results of Experiment 1 show that coherent motion was perceived more often under the attending-to-occluders condition than under the attending-to-moving-lines condition, indicating that spatial attention can bias multistable perception. The results of Experiment 2 show that the mean probability of reporting lines behind occluders in small binocular disparities was significantly higher under the attending-to-occluders condition than under the attending-to-lines condition, indicating that spatial attention can make attended areas look slightly closer. The results of Experiments 3 and 4 show that the effect of spatial attention on biasing multistable perception was weakened when there were binocular or monocular depth cues to define the depth relationship between the occluders and the lines. These results are all consistent with the notion that spatial attention can bias multistable perception through affecting depth perception, making attended areas look closer.
5

Depth Estimation Using Adaptive Bins via Global Attention at High Resolution

Bhat, Shariq 21 April 2021 (has links)
We address the problem of estimating a high quality dense depth map from a single RGB input image. We start out with a baseline encoder-decoder convolutional neural network architecture and pose the question of how the global processing of information can help improve overall depth estimation. To this end, we propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image. The final depth values are estimated as linear combinations of the bin centers. We call our new building block AdaBins. Our results show a decisive improvement over the state-of-the-art on several popular depth datasets across all metrics. We also validate the effectiveness of the proposed block with an ablation study.
6

Semantic Segmentation For Free Drive-able Space Estimation

Gallagher, Eric 02 October 2020 (has links)
Autonomous Vehicles need precise information as to the Drive-able space in order to be able to safely navigate. In recent years deep learning and Semantic Segmentation have attracted intense research. It is a highly advancing and rapidly evolving field that continues to provide excellent results. Research has shown that deep learning is emerging as a powerful tool in many applications. The aim of this study is to develop a deep learning system to estimate the Free Drive-able space. Building on the state of the art deep learning techniques, semantic segmentation will be used to replace the need for highly accurate maps, that are expensive to license. Free Drive-able space is defined as the drive-able space on the correct side of the road, that can be reached without a collision with another road user or pedestrian. A state of the art deep network will be trained with a custom data-set in order to learn complex driving decisions. Motivated by good results, further deep learning techniques will be applied to measure distance from monocular images. The findings demonstrate the power of deep learning techniques in complex driving decisions. The results also indicate the economic and technical feasibility of semantic segmentation over expensive high definition maps.
7

Monocular Depth Estimation: Datasets, Methods, and Applications

Bauer, Zuria 15 September 2021 (has links)
The World Health Organization (WHO) stated in February 2021 at the Seventy- Third World Health Assembly that, globally, at least 2.2 billion people have a near or distance vision impairment. They also denoted the severe impact vision impairment has on the quality of life of the individual suffering from this condition, how it affects the social well-being and their economic independence in society, becoming in some cases an additional burden to also people in their immediate surroundings. In order to minimize the costs and intrusiveness of the applications and maximize the autonomy of the individual life, the natural solution is using systems that rely on computer vision algorithms. The systems improving the quality of life of the visually impaired need to solve different problems such as: localization, path recognition, obstacle detection, environment description, navigation, etc. Each of these topics involves an additional set of problems that have to be solved to address it. For example, for the task of object detection, there is the need of depth prediction to know the distance to the object, path recognition to know if the user is on the road or on a pedestrian path, alarm system to provide notifications of danger for the user, trajectory prediction of the approaching obstacle, and those are only the main key points. Taking a closer look at all of these topics, they have one key component in common: depth estimation/prediction. All of these topics are in need of a correct estimation of the depth in the scenario. In this thesis, our main focus relies on addressing depth estimation in indoor and outdoor environments. Traditional depth estimation methods, like structure from motion and stereo matching, are built on feature correspondences from multiple viewpoints. Despite the effectiveness of these approaches, they need a specific type of data for their proper performance. Since our main goal is to provide systems with minimal costs and intrusiveness that are also easy to handle we decided to infer the depth from single images: monocular depth estimation. Estimating depth of a scene from a single image is a simple task for humans, but it is notoriously more difficult for computational models to be able to achieve high accuracy and low resource requirements. Monocular Depth Estimation is this very task of estimating depth from a single RGB image. Since there is only a need of one image, this approach is used in applications such as autonomous driving, scene understanding or 3D modeling where other type of information is not available. This thesis presents contributions towards solving this task using deep learning as the main tool. The four main contributions of this thesis are: first, we carry out an extensive review of the state-of-the-art in monocular depth estimation; secondly, we introduce a novel large scale high resolution outdoor stereo dataset able to provide enough image information to solve various common computer vision problems; thirdly, we show a set of architectures able to predict monocular depth effectively; and, at last, we propose two real life applications of those architectures, addressing the topic of enhancing the perception for the visually impaired using low-cost wearable sensors.
8

Monocular Depth Estimation with Edge-Based Constraints and Active Learning

January 2019 (has links)
abstract: The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall. This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019
9

Monocular Depth Estimation with Edge-Based Constraints using Active Learning Optimization

Saleh, Shadi 04 April 2024 (has links)
Depth sensing is pivotal in robotics; however, monocular depth estimation encounters significant challenges. Existing algorithms relying on large-scale labeled data and large Deep Convolutional Neural Networks (DCNNs) hinder real-world applications. We propose two lightweight architectures that achieve commendable accuracy rates of 91.2% and 90.1%, simultaneously reducing the Root Mean Square Error (RMSE) of depth to 4.815 and 5.036. Our lightweight depth model operates at 29-44 FPS on the Jetson Nano GPU, showcasing efficient performance with minimal power consumption. Moreover, we introduce a mask network designed to visualize and analyze the compact depth network, aiding in discerning informative samples for the active learning approach. This contributes to increased model accuracy and enhanced generalization capabilities. Furthermore, our methodology encompasses the introduction of an active learning framework strategically designed to enhance model performance and accuracy by efficiently utilizing limited labeled training data. This novel framework outperforms previous studies by achieving commendable results with only 18.3% utilization of the KITTI Odometry dataset. This performance reflects a skillful balance between computational efficiency and accuracy, tailored for low-cost devices while reducing data training requirements.:1. Introduction 2. Literature Review 3. AI Technologies for Edge Computing 4. Monocular Depth Estimation Methodology 5. Implementation 6. Result and Evaluation 7. Conclusion and Future Scope Appendix
10

Monocular Depth Prediction in Deep Neural Networks

Tang, Guanqian January 2019 (has links)
With the development of artificial neural network (ANN), it has been introduced in more and more computer vision tasks. Convolutional neural networks (CNNs) are widely used in object detection, object tracking, and semantic segmentation, achieving great performance improvement than traditional algorithms. As a classical topic in computer vision, the exploration of applying deep CNNs for depth recovery from monocular images is popular, since the single-view image is more common than stereo image pair and video. However, due to the lack of motion and geometry information, monocular depth estimation is much more difficult. This thesis aims at investigating depth prediction from single images by exploiting state-of-the-art deep CNN models. Two neural networks are studied: the first network uses the idea of a global and local network, and the other one adopts a deeper fully convolutional network by using a pre-trained backbone CNN (ResNet or DenseNet). We compare the performance of the two networks and the result shows that the deeper convolutional neural network with the pre-trained backbone can achieve better performance. The pre-trained model can significantly accelerate the training process. We also find that the amount of training dataset is essential for CNN-based monocular depth prediction. / Utvecklingen av artificiella neurala nätverk (ANN) har gjort att det nu använts i flertal datorseende tekniker för att förbättra prestandan. Convolutional Neural Networks (CNN) används ofta inom objektdetektering, objektspårning och semantisk segmentering, och har en bättre prestanda än de föregående algoritmerna. Användningen av CNNs för djup prediktering för single-image har blivit populärt, på grund av att single-image är vanligare än stereo-image och filmer. På grund av avsaknaden av rörelse och geometrisk information, är det mycket svårare att veta djupet i en bild än för en film. Syftet med masteruppsatsen är att implementera en ny algoritm för djup prediktering, specifikt för bilder genom att använda CNN modeller. Två olika neurala nätverk analyserades; det första använder sig av lokalt och globalt nätverk och det andra består av ett avancerat Convolutional Neural Network som använder en pretrained backbone CNN (ResNet eller DenseNet). Våra analyser visar att avancerat Convolutional Neural Network som använder en pre-trained backbone CNN har en bättre prestanda som påskyndade inlärningsprocessen avsevärt. Vi kom även fram till att mängden data för inlärningsprocessen var avgörande för CNN-baserad monokulär djup prediktering.

Page generated in 0.068 seconds