Spelling suggestions: "subject:"[een] DEPTH ESTIMATION"" "subject:"[enn] DEPTH ESTIMATION""
1 |
Tomographic inversion of traveltime data in reflection seismologyWilliamson, P. R. January 1986 (has links)
No description available.
|
2 |
Temporally consistent semantic segmentation in videosRaza, Syed H. 08 June 2015 (has links)
The objective of this Thesis research is to develop algorithms for temporally consistent semantic segmentation in videos. Though many different forms of semantic segmentations exist, this research is focused on the problem of temporally-consistent holistic scene understanding in outdoor videos. Holistic scene understanding requires an understanding of many individual aspects of the scene including 3D layout, objects present, occlusion boundaries, and depth. Such a description of a dynamic scene would be useful for many robotic applications including object reasoning, 3D perception, video analysis, video coding, segmentation, navigation and activity recognition.
Scene understanding has been studied with great success for still images. However, scene understanding in videos requires additional approaches to account for the temporal variation, dynamic information, and exploiting causality. As a first step, image-based scene understanding methods can be directly applied to individual video frames to generate a description of the scene. However, these methods do not exploit temporal information across neighboring frames. Further, lacking temporal consistency, image-based methods can result in temporally-inconsistent labels across frames. This inconsistency can impact performance, as scene labels suddenly change between frames.
The objective of our this study is to develop temporally consistent scene descriptive algorithms by processing videos efficiently, exploiting causality and data-redundancy, and cater for scene dynamics. Specifically, we achieve our research objectives by (1) extracting geometric context from videos to give broad 3D structure of the scene with all objects present, (2) Detecting occlusion boundaries in videos due to depth discontinuity, (3) Estimating depth in videos by combining monocular and motion features with semantic features and occlusion boundaries.
|
3 |
Εκτίμηση βάθους σκηνής από κάμερα τοποθετημένη σε αυτοκίνητο που κινείταιΚαπρινιώτης, Αχιλλέας 10 June 2014 (has links)
Στη διπλωματική αυτή εργασία αναλύεται η εκτίμηση του βάθους μίας άκαμπτης σκηνής από κάμερα τοποθετημένη σε αυτοκίνητο που κινείται. Στο κεφάλαιο 1 γίνεται μία εισαγωγή στον τομέα της Υπολογιστικής Όρασης και δίνονται μερικά παραδείγματα εφαρμογών της. Στο κεφάλαιο 2 περιγράφονται βασικές αρχές της προβολικής γεωμετρίας που χρησιμοποιείται ως μαθηματικό υπόβαθρο για τα επόμενα κεφάλαια. Στο κεφάλαιο 3 γίνεται λόγος για το θεωρητικό μοντέλο της κάμερας, των παραμέτρων της και των παραμορφώσεων που υπεισέρχονται στο μοντέλο αυτό. Στο κεφάλαιο 4 αναφέρεται η διαδικασία βαθμονόμησης της κάμερας, μαζί με την υλοποίησή της. Στο κεφάλαιο 5 παρουσιάζονται γενικές κατηγορίες των στερεοσκοπικών αλγορίθμων που χρησιμοποιούνται, καθώς και τα κατάλληλα μέτρα ομοιότητάς τους. Στο κεφάλαιο 6 γίνεται αναφορά στον ανιχνευτή γωνιών Harris και γίνεται η εφαρμογή του τόσο ως προς την ανίχνευση των γωνιών, όσο και ως προς την αντιστοίχιση των 2 εικόνων. Στο κεφάλαιο 7 αναλύεται η θεωρία του αλγόριθμου SIFT και δίνεται ένα παράδειγμα ανίχνευσης και αντιστοίχισης χαρακτηριστικών. Στο κεφάλαιο 8 επισημαίνονται οι βασικές αρχές της επιπολικής γεωμετρίας, καθώς η σημασία της διόρθωσης των εικόνων. Στο κεφάλαιο 9 αναφέρεται η συνολική διαδικασία που ακολουθήθηκε, μαζί με την περιγραφή και την υλοποίηση των μεθόδων εκτίμησης βάθους που χρησιμοποιήθηκαν. / The current master’s thesis analyzes the depth estimation of a rigid scene from a camera attached to a moving vehicle. The first chapter gives an introduction to the field of Computer Vision and provides some examples of its applications. The second chapter describes basic principles of projective geometry that are being used as mathematical background for the next chapters. The third chapter refers to the theoretical modeling of a camera, along with its parameters and the distortions that appear in this model. The forth chapter deals with the camera calibration procedure, along with its implementation. Chapter five presents general categories of stereoscopic algorithms, along with their similarity measures. Chapter six talks about Harris corner detector and its implementation in detecting corners and in the matching process as well. Chapter 7 analyzes the SIFT algorithm theory and gives an example of detecting and matching features. Chapter 8 highlights basic principles of epipolar geometry and stresses out the importance of image rectification. Chapter nine presents the procedure that has been followed, along with the description and implementation of the depth estimation methods that have been used.
|
4 |
Domain-Independent Moving Object Depth Estimation using Monocular Camera / Domän-oberoende djupestimering av objekt i rörelse med monokulär kameraNassir, Cesar January 2018 (has links)
Today automotive companies across the world strive to create vehicles with fully autonomous capabilities. There are many benefits of developing autonomous vehicles, such as reduced traffic congestion, increased safety and reduced pollution, etc. To be able to achieve that goal there are many challenges ahead, one of them is visual perception. Being able to estimate depth from a 2D image has been shown to be a key component for 3D recognition, reconstruction and segmentation. Being able to estimate depth in an image from a monocular camera is an ill-posed problem since there is ambiguity between the mapping from colour intensity and depth value. Depth estimation from stereo images has come far compared to monocular depth estimation and was initially what depth estimation relied on. However, being able to exploit monocular cues is necessary for scenarios when stereo depth estimation is not possible. We have presented a novel CNN network, BiNet which is inspired by ENet, to tackle depth estimation of moving objects using only a monocular camera in real-time. It performs better than ENet in the Cityscapes dataset while adding only a small overhead to the complexity. / I dag strävar bilföretag över hela världen för att skapa fordon med helt autonoma möjligheter. Det finns många fördelar med att utveckla autonoma fordon, såsom minskad trafikstockning, ökad säkerhet och minskad förorening, etc. För att kunna uppnå det målet finns det många utmaningar framåt, en av dem är visuell uppfattning. Att kunna uppskatta djupet från en 2D-bild har visat sig vara en nyckelkomponent för 3D-igenkännande, rekonstruktion och segmentering. Att kunna uppskatta djupet i en bild från en monokulär kamera är ett svårt problem eftersom det finns tvetydighet mellan kartläggningen från färgintensitet och djupvärde. Djupestimering från stereobilder har kommit långt jämfört med monokulär djupestimering och var ursprungligen den metod som man har förlitat sig på. Att kunna utnyttja monokulära bilder är dock nödvändig för scenarier när stereodjupuppskattning inte är möjligt. Vi har presenterat ett nytt nätverk, BiNet som är inspirerat av ENet, för att ta itu med djupestimering av rörliga objekt med endast en monokulär kamera i realtid. Det fungerar bättre än ENet med datasetet Cityscapes och lägger bara till en liten kostnad på komplexiteten.
|
5 |
Self-supervised monocular image depth learning and confidence estimationChen, L., Tang, W., Wan, Tao Ruan, John, N.W. 17 June 2020 (has links)
No / We present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
|
6 |
Horizontal to vertical spectral ratio of seismic ambient noise: Estimating the depth a mine tailing. / Horisontellt och vertikalt spektralförhållande för seismiskt omgivningsljud: Uppskattning av tjockleken på gruvavfall.Hellerud, Niels January 2024 (has links)
As the world moves towards more green technology and energy-resources, the need for rare earth elements (REE) has increased rapidly. A potential secondary resource for REE’s are mine tailings, and a technique to estimate the thickness of a tailing is the horizontal-to-vertical spectral ratio (HVSR) method. In this project, the depth of a mine-tailing along a profile in Blötberget was estimated using this method. The HVSR method is a non-invasive environmentally friendly seismic method which utilizes ambient noise of the Earth. The method uses seismic sensors consisting of 3 components, which measures ground motion in three directions. The acquired data was processed in the Geopsy software, where certain parameters, such as filtering and window selection, are set to make the most satisfactory results. The Geopsy software provides the user HVSRs for the selected windows. This ratio makes up a curve in the frequency domain, where a fundamental resonant frequency can be derived. The fundamental frequency is determined as the sharp, lowest-frequency peak in the data in case of a strong velocity contrast. This fundamental frequency must fulfil certain criteria to be considered reliable. When the fundamental resonant frequencies could be determined reliable, they were mathematically calculated into the thickness of the tailing by a simple mathematical formula in Excel, using the shear-wave velocity of the overlying layer and the fundamental frequency. The elevation at the location of each sensor and the thickness of the contrasting interface is used to provide a 2-D depth of the mine-tailing. This profile was compared to radiomagnetotelluric measurements. Although the measurement locations were not coinciding reasonable results were obtained.
|
7 |
Using Texture Features To Perform Depth EstimationKotha, Bhavi Bharat 22 January 2018 (has links)
There is a great need in real world applications for estimating depth through electronic means without human intervention. There are many methods in the field which help in autonomously finding depth measurements. Some of which are using LiDAR, Radar, etc. One of the most researched topic in the field of depth measurements is Computer Vision which uses techniques on 2D images to achieve the desired result. Out of the many 3D vision techniques used, stereovision is a field where a lot of research is being done to solve this kind of problem. Human vision plays an important part behind the inspiration and research performed in this field.
Stereovision gives a very high spatial resolution of depth estimates which is used for obstacle avoidance, path planning, object recognition, etc. Stereovision makes use of two images in the image pair. These images are taken with two cameras from different views and those two images are processed to get depth information.
Processing stereo images has been one of the most intensively sought-after research topics in computer vision. Many factors affect the performance of this approach like computational efficiency, depth discontinuities, lighting changes, correspondence and correlation, electronic noise, etc.
An algorithm is proposed which uses texture features obtained using Laws Energy Masks and multi-block approach to perform correspondence matching between stereo pair of images with high baseline. This is followed by forming disparity maps to get the relative depth of pixels in the image. An analysis is also made between this approach to the current state-of-the-art algorithms. A robust method to score and rank the stereo algorithms is also proposed. This approach provides a simple way for researchers to rank the algorithms according to their application needs. / Master of Science / There is a great need in real world applications for estimating depth through electronic means without human intervention. There are many methods in the field which help in autonomously finding depth measurements. Some of which are using LiDAR, Radar, etc. One of the most researched topic in the field of depth measurements is Computer Vision which uses techniques on 2D images to achieve the desired result. Out of the many 3D vision techniques used, stereovision is a field where a lot of research is being done to solve this kind of problem. Human vision plays a important part behind the inspiration and research performed in this field. A large variety of algorithms are being developed to find the measure of depth of ideally each and every point on the pictured scene giving us a very high spatial resolution as compared to other methods.
Real world needs of depth estimation and the benefits provided by using stereo vision are the main driving force behind this approach. Stereovision gives a very high spatial resolution which is used for obstacle avoidance, path planning, object recognition, etc. Stereovision makes use of image pairs taken from two cameras with different perspective to estimate depth. The two images in the image pair are taken with two cameras from different views (translational change in view) and those two images are processed to get depth information. The software tool developed is a new approach to perform correspondence matching to find depth using stereo vision concepts.
This software tool developed in this work is written in MATLAB. The tools efficiency was evaluated using standard techniques which have been described in detail. The evaluation was also performed by using the software tool with the images collected using a pair of stereo cameras and a tape measure to measure the depth of an object by hand. A scoring method has also been proposed to rank the algorithms in the field of stereo vision.
|
8 |
Single image scene-depth estimation based on self-supervised deep learning : For perception in autonomous heavy duty vehiclesPiven, Yegor January 2021 (has links)
Depth information is a vital component for perception of the 3D structure of vehicle's surroundings in the autonomous scenario. Ubiquity and relatively low cost of camera equipment make image-based depth estimation very attractive compared to employment of the specialised sensors. Classical image-based depth estimation approaches typically rely on multi-view geometry, requiring alignment and calibration between multiple image sources, which is both cumbersome and error-prone. In contrast, single images lack both temporal information and multi-view correspondences. Also, depth information is lost in projection from the 3D world to a 2D image during the image formation process, making single image depth estimation problem ill-posed. In recent years, Deep Learning-based approaches have been widely proposed for single image depth estimation. The problem is typically tackled in a supervised manner, requiring access to image data with pixel-wise depth information. Acquisition of large amounts of such data that is both varied and accurate, is a laborious and costly task. As an alternative, a number of self-supervised approaches exist showing that it is possible to train models performing single image depth estimation using synchronized stereo image-pairs or sequences of monocular images instead of depth labeled data. This thesis investigates the self-supervised approach utilizing sequences of monocular images, by training and evaluating one of the state-of-the-art methods on both the popular public KITTI dataset and the data of the host company, Scania. A number of extensions are implemented for the method of choice, namely addition of weak supervision with velocity data, employment of geometry consistency constraints and incorporation of a self-attention mechanism. Resulting models showed good depth estimation performance for major components of the scene, such as nearby roads and buildings, however struggled at further ranges, and with dynamic objects and thin structures. Minor qualitative and quantitative improvements in performance were observed with introduction of geometry consistency loss and mask, as well as the self-attention mechanism. Qualitative improvements included the models' enhanced ability to identify clearer object boundaries and better distinguish objects from their background. Geometry consistency loss also proved to be informative in low-texture regions of the image and resolved artifacting behaviour that was observed when training models on Scania's data. Incorporation of the supervision of predicted translations using velocity data has proved to be effective at enforcing the metric scale of the depth network's predictions. However, a risk of overfitting to such supervision was observed when training on Scania's data. In order to resolve this issue, velocity-supervised fine-tuning procedure is proposed as an alternative to velocity-supervised training from scratch, resolving the observed overfitting issue while still enabling the model to learn the metric scale. Proposed fine-tuning procedure was effective even when training models on the KITTI dataset, where no overfitting was observed, suggesting its general applicability.
|
9 |
Applied statistical modeling of three-dimensional natural scene dataSu, Che-Chun 27 June 2014 (has links)
Natural scene statistics (NSS) have played an increasingly important role in both our understanding of the function and evolution of the human vision system, and in the development of modern image processing applications. Because depth/range, i.e., egocentric distance, is arguably the most important thing a visual system must compute (from an evolutionary perspective), the joint statistics between natural image and depth/range information are of particular interest. However, while there exist regular and reliable statistical models of two-dimensional (2D) natural images, there has been little work done on statistical modeling of natural luminance/chrominance and depth/disparity, and of their mutual relationships. One major reason is the dearth of high-quality three-dimensional (3D) image and depth/range database. To facilitate research progress on 3D natural scene statistics, this dissertation first presents a high-quality database of color images and accurately co-registered depth/range maps using an advanced laser range scanner mounted with a high-end digital single-lens reflex camera. By utilizing this high-resolution, high-quality database, this dissertation performs reliable and robust statistical modeling of natural image and depth/disparity information, including new bivariate and spatial oriented correlation models. In particular, these new statistical models capture higher-order dependencies embedded in spatially adjacent bandpass responses projected from natural environments, which have not yet been well understood or explored in literature. To demonstrate the efficacy and effectiveness of the advanced NSS models, this dissertation addresses two challenging, yet very important problems, depth estimation from monocular images and no-reference stereoscopic/3D (S3D) image quality assessment. A Bayesian depth estimation framework is proposed to consider the canonical depth/range patterns in natural scenes, and it forms priors and likelihoods using both univariate and bivariate NSS features. The no-reference S3D image quality index proposed in this dissertation exploits new bivariate and correlation NSS features to quantify different types of stereoscopic distortions. Experimental results show that the proposed framework and index achieve superior performance to state-of-the-art algorithms in both disciplines. / text
|
10 |
Computational Imaging For Miniature CamerasSalahieh, Basel January 2015 (has links)
Miniature cameras play a key role in numerous imaging applications ranging from endoscopy and metrology inspection devices to smartphones and head-mount acquisition systems. However, due to the physical constraints, the imaging conditions, and the low quality of small optics, their imaging capabilities are limited in terms of the delivered resolution, the acquired depth of field, and the captured dynamic range. Computational imaging jointly addresses the imaging system and the reconstructing algorithms to bypass the traditional limits of optical systems and deliver better restorations for various applications. The scene is encoded into a set of efficient measurements which could then be computationally decoded to output a richer estimate of the scene as compared with the raw images captured by conventional imagers. In this dissertation, three task-based computational imaging techniques are developed to make low-quality miniature cameras capable of delivering realistic high-resolution reconstructions, providing full-focus imaging, and acquiring depth information for high dynamic range objects. For the superresolution task, a non-regularized direct superresolution algorithm is developed to achieve realistic restorations without being penalized by improper assumptions (e.g., optimizers, priors, and regularizers) made in the inverse problem. An adaptive frequency-based filtering scheme is introduced to upper bound the reconstruction errors while still producing more fine details as compared with previous methods under realistic imaging conditions. For the full-focus imaging task, a computational depth-based deconvolution technique is proposed to bring a scene captured by an ordinary fixed-focus camera to a full-focus based on a depth-variant point spread function prior. The ringing artifacts are suppressed on three levels: block tiling to eliminate boundary artifacts, adaptive reference maps to reduce ringing initiated by sharp edges, and block-wise deconvolution or depth-based masking to suppress artifacts initiated by neighboring depth-transition surfaces. Finally for the depth acquisition task, a multi-polarization fringe projection imaging technique is introduced to eliminate saturated points and enhance the fringe contrast by selecting the proper polarized channel measurements. The developed technique can be easily extended to include measurements captured under different exposure times to obtain more accurate shape rendering for very high dynamic range objects.
|
Page generated in 0.0501 seconds