31 |
Estimação monocular de profundidade por aprendizagem profunda para veículos autônomos: influência da esparsidade dos mapas de profundidade no treinamento supervisionado / Monocular depth estimation by deep learning for autonomous vehicles: influence of depth maps sparsity in supervised trainingRosa, Nícolas dos Santos 24 June 2019 (has links)
Este trabalho aborda o problema da estimação de profundidade a partir de imagens monoculares (SIDE), com foco em melhorar a qualidade das predições de redes neurais profundas. Em um cenário de aprendizado supervisionado, a qualidade das predições está intrinsecamente relacionada aos rótulos de treinamento, que orientam o processo de otimização. Para cenas internas, sensores de profundidade baseados em escaneamento por luz estruturada (Ex.: Kinect) são capazes de fornecer mapas de profundidade densos, embora de curto alcance. Enquanto que para cenas externas, consideram-se LiDARs como sensor de referência, que comparativamente fornece medições mais esparsas, especialmente em regiões mais distantes. Em vez de modificar a arquitetura de redes neurais para lidar com mapas de profundidade esparsa, este trabalho introduz um novo método de densificação para mapas de profundidade, usando o framework de Mapas de Hilbert. Um mapa de ocupação contínuo é produzido com base nos pontos 3D das varreduras do LiDAR, e a superfície reconstruída resultante é projetada em um mapa de profundidade 2D com resolução arbitrária. Experimentos conduzidos com diferentes subconjuntos do conjunto de dados do KITTI mostram uma melhora significativa produzida pela técnica proposta (esparso-para-contínuo), sem necessitar inserir informações extras durante a etapa de treinamento. / This work addresses the problem of single image depth estimation (SIDE), focusing on improving the quality of deep neural network predictions. In a supervised learning scenario, the quality of predictions is intrinsically related to the training labels, which guide the optimization process. For indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to provide dense, albeit short-range, depth maps. While for outdoor scenes, LiDARs are considered the standard sensor, which comparatively provide much sparser measurements, especially in areas further away. Rather than modifying the neural network architecture to deal with sparse depth maps, this work introduces a novel densification method for depth maps using the Hilbert Maps framework. A continuous occupancy map is produced based on 3D points from LiDAR scans, and the resulting reconstructed surface is projected into a 2D depth map with arbitrary resolution. Experiments conducted with various subsets of the KITTI dataset show a significant improvement produced by the proposed Sparse-to-Continuous technique, without the introduction of extra information into the training stage.
|
32 |
Cyclopean optical flowRobles Hernández, Maria Fernanda 04 1900 (has links)
This thesis is in the field of computer vision, focusing on the problems of optical flow
estimation. Optical flow is a notoriously difficult 2D problem since it’s inherently underconstrained. To introduce the concept of cyclopean optical flow, we will downgrade the
2D into 1D to make it more accessible. It proposes a new approach based on a "cyclopean" frame of reference. We apply a constrained gradient-based technique to solve
1D optical flow, for which the constraints are gradient behavior and correlation score.
This thesis focuses on the fundamental problem of ensuring that the gradient remains
usable in an interval large enough to cover the spatial displacement of motion. The proposed "cyclopean" approach does not enforce optical measurements over a fixed grid,
which results in more reliable results. To further increase the allowed motion interval,
we propose a pyramidal constraint that allows solving over a coarse-to-fine approach.
We solved over aerial imagery, Sintel data-set, and Sintel data-set when artificially displaced 10% of the ground truth. This work is developed in the "continuous" framework
commonly used for small motion optical flow. Our results showed good management of
false positives while maintaining a good amount of convergence density. However, our
method isn’t as precise as the current state-of-the-art benchmarks, as it specializes in
very small motions. Also, it’s important to mention versatility comes with the concept
of "continuous" representation. This allows us to select regions to be solved, opening
the possibility of adapting to the spectrum of sparse or dense optical flow.
From this study point of view, we can highlight traditional methods have relevance
even in the deep learning era, offering a new set of tools to exploit on the pursue of
solving optical flow. / Ce mémoire s’intéresse au domaine de la vision par ordinateur, et plus particulièrement l’estimation du flux optique. Le flux optique est un problème 2D notoirement
difficile, car il est intrinsèquement sous-contraint. Pour introduire la notion de flux optique cyclopéen, nous allons considérer le problème en 1D pour éliminer le problème
d’ouverture lié au mouvement 2D. Nous proposons une nouvelle approche basée sur
un référentiel « cyclopéen », basée sur gradient calculé dans un espace continu pour
résoudre le flux optique 1D. Ce mémoire se concentre a garantir que le gradient reste
utilisable dans un intervalle suffisamment grand pour couvrir le déplacement spatial du
mouvement. Lors de la résolution sur une approche coarse-to-fine, une représentation
pyramidale est utilisée. Les résultats sur des images aériennes ainsi que des données
synthétiques sont prometteurs. Ce travail se distingue des tendances actuelles en flux
optique parle fait qu’il se spécialise pour les flux optiques à faible mouvement.
Nos résultats ont montré une bonne gestion des faux positifs tout en conservant
une bonne densité. Nous considérons que la fiabilité des mesures de mouvement est
très élevée, ce qui est au moins aussi important que la précision elle-même dans beaucoup d’applications. Ainsi, la polyvalence de la représentation "continue" permet de
mieux contrôler la densité obtenue en fonction de la scène analysée. À notre avis, cette
approche, qui complète les méthodes traditionnelles, ouvrira la voie à de nouvelles
approches en apprentissage profond.
|
33 |
Monocular Depth Estimation with Edge-Based Constraints using Active Learning OptimizationSaleh, Shadi 04 April 2024 (has links)
Depth sensing is pivotal in robotics; however, monocular depth estimation encounters significant challenges. Existing algorithms relying on large-scale labeled data and large Deep Convolutional Neural Networks (DCNNs) hinder real-world applications. We propose two lightweight architectures that achieve commendable accuracy rates of 91.2% and 90.1%, simultaneously reducing the Root Mean Square Error (RMSE) of depth to 4.815 and 5.036. Our lightweight depth model operates at 29-44 FPS on the Jetson Nano GPU, showcasing efficient performance with minimal power consumption.
Moreover, we introduce a mask network designed to visualize and analyze the compact depth network, aiding in discerning informative samples for the active learning approach. This contributes to increased model accuracy and enhanced generalization capabilities.
Furthermore, our methodology encompasses the introduction of an active learning framework strategically designed to enhance model performance and accuracy by efficiently utilizing limited labeled training data. This novel framework outperforms previous studies by achieving commendable results with only 18.3% utilization of the KITTI Odometry dataset. This performance reflects a skillful balance between computational efficiency and accuracy, tailored for low-cost devices while reducing data training requirements.:1. Introduction
2. Literature Review
3. AI Technologies for Edge Computing
4. Monocular Depth Estimation Methodology
5. Implementation
6. Result and Evaluation
7. Conclusion and Future Scope
Appendix
|
34 |
Multi-view Video Coding Via Dense Depth FieldOzkalayci, Burak Oguz 01 September 2006 (has links) (PDF)
Emerging 3-D applications and 3-D display technologies raise
some transmission problems of the next-generation multimedia data.
Multi-view Video Coding (MVC) is one of the challenging topics in
this area, that is on its road for standardization via ISO MPEG. In
this thesis, a 3-D geometry-based MVC approach is proposed and
analyzed in terms of its compression performance. For this purpose,
the overall study is partitioned into three preceding parts. The
first step is dense depth estimation of a view from a fully
calibrated multi-view set. The calibration information and
smoothness assumptions are utilized for determining dense
correspondences via a Markov Random Field (MRF) model, which is
solved by Belief Propagation (BP) method. In the second part, the
estimated dense depth maps are utilized for generating (predicting)
arbitrary (other camera) views of a scene, that is known as novel
view generation. A 3-D warping algorithm, which is followed by an
occlusion-compatible hole-filling process, is implemented for this
aim. In order to suppress the occlusion artifacts, an intermediate
novel view generation method, which fuses two novel views generated
from different source views, is developed. Finally, for the last
part, dense depth estimation and intermediate novel view generation
tools are utilized in the proposed H.264-based MVC scheme for the
removal of the spatial redundancies between different views. The
performance of the proposed approach is compared against the
simulcast coding and a recent MVC proposal, which is expected to be
the standard recommendation for MPEG in the near future. These
results show that the geometric approaches in MVC can still be
utilized, especially in certain 3-D applications, in addition to
conventional temporal motion compensation techniques, although the
rate-distortion performances of geometry-free approaches are quite
superior.
|
35 |
Τρισδιάστατη ανακατασκευή χώρου από ένα μικρό αριθμό φωτογραφιώνΦλώρου, Ραφαέλλα, Χατούπης, Σταύρος 26 April 2012 (has links)
Η παρούσα διπλωματική εργασία αναπτύχθηκε στα πλαίσια των προπτυχιακών σπουδών του τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών του Πανεπιστημίου Πατρών. Θέμα της είναι η τρισδιάστατη ανακατασκευή του χώρου από τουλάχιστον δύο φωτογραφίες του και αποτελεί μέρος του τομέα της Υπολογιστικής Όρασης. Συγκεκριμένα αναλύεται διεξοδικά η περίπτωση της στερεοσκοπικής όρασης, στην οποία η κάμερα μεταξύ δύο διαδοχικών λήψεων της ίδιας σκηνής, έχει μηδενική σχετική περιστροφή ως προς την αρχική της θέση και μικρή μετατόπιση, περίπου 5 εκατοστά. Με τον τρόπο αυτό, προσπαθούμε να προσομοιώσουμε τη λειτουργία της ανθρώπινης όρασης καθώς πολλές εφαρμογές της Τεχνητής Νοημοσύνης το κρίνουν απαραίτητο.
Είναι λογικό ότι ο κάθε άνθρωπος θεωρεί τη στερεοσκοπική όραση αυτονόητη γιατί κινείται στον τρισδιάστατο κόσμο. Όταν αυτός όμως καταγράφεται από μία κάμερα, αυτόματα περνάει στο δισδιάστατο επίπεδο. Και πάλι είναι δυνατόν να εξάγουμε πληροφορίες βάθους από μία μόνο εικόνα, όμως γίνεται καθαρά εμπειρικά και βασίζεται στη σύγκριση διάφορων υφών, σχημάτων και μεγεθών. Ο ηλεκτρονικός υπολογιστής αναγνωρίζει την εικόνα σαν ένα οποιοδήποτε αρχείο. Δεν μπορεί να εξάγει κανένα συμπέρασμα για το τι απεικονίζει στον πραγματικό κόσμο. Χρειάζεται το συνδυασμό τουλάχιστον δύο εικόνων της ίδιας σκηνής από διαφορετικές θέσεις για να μπορέσει να αναγνωρίσει για παράδειγμα το βάθος της σκηνής που απεικονίζεται.
Αυτή τη διαδικασία περιγράφει αναλυτικά η εργασία. Στο πρώτο κεφάλαιο εισάγουμε την έννοια και τη χρησιμότητα της στερεοσκοπικής όρασης. Στο δεύτερο κεφάλαιο παρουσιάζονται οι βασικές αρχές της προβολικής γεωμετρίας. Στο τρίτο κεφάλαιο αναφερόμαστε στη μοντελοποίηση της κάμερας και τις παραμέτρους που τη χαρακτηρίζουν. Στο τέταρτο κεφάλαιο αναλύεται η διαδικασία της βαθμονόμησης της κάμερας. Στο πέμπτο κεφάλαιο εξηγείται η διαδικασία αντιστοίχησης των σημείων ενδιαφέροντος στις δύο εικόνες. Στο έκτο κεφάλαιο αναλύονται οι βασικές αρχές της επιπολικής γεωμετρίας. Στο έβδομο κεφάλαιο παρουσιάζεται η πειραματική διαδικασία για την εύρεση του βάθους της σκηνής. Στο όγδοο κεφάλαιο παρουσιάζεται συνοπτικά η τρισδιάστατη ανακατασκευή του χώρου και παρουσιάζονται τα αντίστοιχα πειραματικά αποτελέσματα. Στο ένατο κεφάλαιο διατυπώνουμε τα συμπεράσματα της όλης διαδικασίας.
Τόσο το θεωρητικό όσο και το πειραματικό μέρος αυτής της εργασίας καλύπτουν σε ένα μεγάλο ποσοστό τα βασικά στάδια ανακατασκευής του τρισδιάστατου χώρου. Τα αποτελέσματα της πειραματικής διαδικασίας αποδεικνύουν ότι οι υπάρχουσες μέθοδοι λειτουργούν ικανοποιητικά αλλά υπάρχουν πολλά περιθώρια βελτίωσης στο θέμα της Υπολογιστικής Όρασης.
Στο σημείο αυτό να ευχαριστήσουμε τον επιβλέποντα καθηγητή μας κ. Δερματά για τη συνεργασία του και την κατανόησή του. / The current thesis has been written as part of the undergraduate studies for the department of Electrical and Computer Engineering of Patras University. Its objective is the three-dimensional (3D) reconstruction from two, at least, photographs, which is part of computer vision. More specifically, this thesis analyzes in detail the case of stereo vision when the camera, among two successive shots of the same image, has zero relative rotation compared to its initial position and an average translation of about 5 cm. In this way, it attempts to simulate human vision since this is essential for many Artificial Intelligence applications.
Humans take stereo vision for granted since they live in a three-dimensional world. However, this world becomes two-dimensional when recorded by a camera. We can still get information about the image depth but this is empirically done based on comparing various heights, shapes and sizes. Images are identified by the computer as any other file. Computers cannot draw conclusions about what is depicted in the real world. They need to combine at least two images of the same scene and of different positions to identify the image’s depth.
This process is described in the current thesis. The first chapter describes stereo vision and why it is so useful. The second chapter provides the basic principles of projective geometry, the mathematical background for passing from the two-dimensional level to the three-dimensional. The third chapter refers to camera modeling and its parameters (instrisic and extrinsic). Chapter four analyzes the camera calibration process. Chapter five explains the matching process of points of interest in both pictures. The sixth chapter provides the basic principles of epipolar geometry. The seventh chapter shows the experimental procedure that we followed in order to estimate the depth of the scene. Chapter eight shows how the 3D reconstruction is finally done. Chapter nine talks about our conclusions and how the results could improve.
Both theoretical and experimental parts of this project cover the key points of 3d reconstruction. The results of the experiments show that the existing methods are satisfying but could improve more.
We want to thank our supervisor professor Mr. Dermatas for his collaboration and his understanding.
|
36 |
Estimation de profondeur à partir d'images monoculaires par apprentissage profond / Depth estimation from monocular images by deep learningMoukari, Michel 01 July 2019 (has links)
La vision par ordinateur est une branche de l'intelligence artificielle dont le but est de permettre à une machine d'analyser, de traiter et de comprendre le contenu d'images numériques. La compréhension de scène en particulier est un enjeu majeur en vision par ordinateur. Elle passe par une caractérisation à la fois sémantique et structurelle de l'image, permettant d'une part d'en décrire le contenu et, d'autre part, d'en comprendre la géométrie. Cependant tandis que l'espace réel est de nature tridimensionnelle, l'image qui le représente, elle, est bidimensionnelle. Une partie de l'information 3D est donc perdue lors du processus de formation de l'image et il est d'autant plus complexe de décrire la géométrie d'une scène à partir d'images 2D de celle-ci.Il existe plusieurs manières de retrouver l'information de profondeur perdue lors de la formation de l'image. Dans cette thèse nous nous intéressons à l’estimation d'une carte de profondeur étant donné une seule image de la scène. Dans ce cas, l'information de profondeur correspond, pour chaque pixel, à la distance entre la caméra et l'objet représenté en ce pixel. L'estimation automatique d'une carte de distances de la scène à partir d'une image est en effet une brique algorithmique critique dans de très nombreux domaines, en particulier celui des véhicules autonomes (détection d’obstacles, aide à la navigation).Bien que le problème de l'estimation de profondeur à partir d'une seule image soit un problème difficile et intrinsèquement mal posé, nous savons que l'Homme peut apprécier les distances avec un seul œil. Cette capacité n'est pas innée mais acquise et elle est possible en grande partie grâce à l'identification d'indices reflétant la connaissance a priori des objets qui nous entourent. Par ailleurs, nous savons que des algorithmes d'apprentissage peuvent extraire ces indices directement depuis des images. Nous nous intéressons en particulier aux méthodes d’apprentissage statistique basées sur des réseaux de neurones profond qui ont récemment permis des percées majeures dans de nombreux domaines et nous étudions le cas de l'estimation de profondeur monoculaire. / Computer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation.
|
37 |
Improving deep monocular depth predictions using dense narrow field of view depth imagesMöckelind, Christoffer January 2018 (has links)
In this work we study a depth prediction problem where we provide a narrow field of view depth image and a wide field of view RGB image to a deep network tasked with predicting the depth for the entire RGB image. We show that by providing a narrow field of view depth image, we improve results for the area outside the provided depth compared to an earlier approach only utilizing a single RGB image for depth prediction. We also show that larger depth maps provide a greater advantage than smaller ones and that the accuracy of the model decreases with the distance from the provided depth. Further, we investigate several architectures as well as study the effect of adding noise and lowering the resolution of the provided depth image. Our results show that models provided low resolution noisy data performs on par with the models provided unaltered depth. / I det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet.
|
38 |
An Analysis of Camera Configurations and Depth Estimation Algorithms for Triple-Camera Computer Vision SystemsPeter-Contesse, Jared 01 December 2021 (has links) (PDF)
The ability to accurately map and localize relevant objects surrounding a vehicle is an important task for autonomous vehicle systems. Currently, many of the environmental mapping approaches rely on the expensive LiDAR sensor. Researchers have been attempting to transition to cheaper sensors like the camera, but so far, the mapping accuracy of single-camera and dual-camera systems has not matched the accuracy of LiDAR systems. This thesis examines depth estimation algorithms and camera configurations of a triple-camera system to determine if sensor data from an additional perspective will improve the accuracy of camera-based systems. Using a synthetic dataset, the performance of a selection of stereo depth estimation algorithms is compared to the performance of two triple-camera depth estimation algorithms: disparity fusion and cost fusion. The cost fusion algorithm in both a multi-baseline and multi-axis triple-camera configuration outperformed the environmental mapping accuracy of non-CNN algorithms in a two-camera configuration.
|
39 |
Deep Learning Approaches to Low-level Vision ProblemsLiu, Huan January 2022 (has links)
Recent years have witnessed tremendous success in using deep learning approaches to handle low-level vision problems. Most of the deep learning based methods address the low-level vision problem by training a neural network to approximate the mapping from the inputs to the desired ground truths. However, directly learning this mapping is usually difficult and cannot achieve ideal performance. Besides, under the setting of unsupervised learning, the general deep learning approach cannot be used. In this thesis, we investigate and address several problems in low-level vision using the proposed approaches.
To learn a better mapping using the existing data, an indirect domain shift mechanism is proposed to add explicit constraints inside the neural network for single image dehazing. This allows the neural network to be optimized across several identified neighbours, resulting in a better performance.
Despite the success of the proposed approaches in learning an improved mapping from the inputs to the targets, three problems of unsupervised learning is also investigated. For unsupervised monocular depth estimation, a teacher-student network is introduced to strategically integrate both supervised and unsupervised learning benefits. The teacher network is formed by learning under the binocular depth estimation setting, and the student network is constructed as the primary network for monocular depth estimation. In observing that the performance of the teacher network is far better than that of the student network, a knowledge distillation approach is proposed to help improve the mapping learned by the student.
For single image dehazing, the current network cannot handle different types of haze patterns as it is trained on a particular dataset. The problem is formulated as a multi-domain dehazing problem. To address this issue, a test-time training approach is proposed to leverage a helper network in assisting the dehazing network adapting to a particular domain using self-supervision.
In lossy compression system, the target distribution can be different from that of the source and ground truths are not available for reference.
Thus, the objective is to transform the source to target under a rate constraint, which generalizes the optimal transport. To address this problem, theoretical analyses on the trade-off between compression rate and minimal achievable distortion are studied under the cases with and without common randomness. A deep learning approach is also developed using our theoretical results for addressing super-resolution and denoising tasks.
Extensive experiments and analyses have been conducted to prove the effectiveness of the proposed deep learning based methods in handling the problems in low-level vision. / Thesis / Doctor of Philosophy (PhD)
|
40 |
Deep Learning-Based Depth Estimation Models with Monocular SLAM : Impacts of Pure Rotational Movements on Scale Drift and RobustnessBladh, Daniel January 2023 (has links)
This thesis explores the integration of deep learning-based depth estimation models with the ORB-SLAM3 framework to address challenges in monocular Simultaneous Localization and Mapping (SLAM), particularly focusing on pure rotational movements. The study investigates the viability of using pre-trained generic depth estimation networks, and hybrid combinations of these networks, to replace traditional depth sensors and improve scale accuracy in SLAM systems. A series of experiments are conducted outdoors, utilizing a custom camera setup designed to isolate pure rotational movements. The analysis involves assessing each model's impact on the SLAM process as well as performance indicators (KPIs) on both depth estimation and 3D tracking. Results indicate a correlation between depth estimation accuracy and SLAM performance, underscoring the potential of depth estimation models in enhancing SLAM systems. The findings contribute to the understanding of the role of monocular depth estimation in integrating with SLAM, especially in applications requiring precise spatial awareness for augmented reality. / Denna avhandling utforskar integrationen av djupinlärningsbaserade modeller för djupuppskattning med ORB-SLAM3-ramverket för att möta utmaningar inom monokulär Samtidig Lokalisering och Kartläggning (SLAM), med särskilt fokus på rena rotationsrörelser. Studien undersöker möjligheten att använda förtränade generiska nätverk för djupuppskattning och hybridkombinationer av dessa nätverk, för att ersätta traditionella djupsensorer och förbättra skalanoggrannheten i SLAM-system. En serie experiment genomförs med användning av en specialbyggd kamerauppställning utformad för att isolera rena rotationsrörelser. Analysen omfattar bedömning av varje modells påverkan på SLAM-processen samt kvantitativa prestandaindikatorer (KPI:er) för både djupuppskattning och följning. Resultaten visar på ett samband mellan noggrannheten i djupuppskattningen och SLAM-prestandan, vilket understryker potentialen hos modeller för djupuppskattning i förbättringen av SLAM-system. Rönen bidrar till förståelsen av rollen som monokulär djupuppskattning har i integrationen med SLAM, särskilt i tillämpningar som kräver exakt spatial medvetenhet.
|
Page generated in 0.1032 seconds