121 |
Monocular Visual Odometry for Autonomous Underwater Navigation : An analysis of learning-based monocular visual odometry approaches in underwater scenarios / Monokulär Visuell Odometri för Autonom Undervattensnavigering : En analys av inlärningsbaserade monokulära visuella odometri-metoder i undervattensscenarierCaraffa, Andrea January 2021 (has links)
Visual Odometry (VO) is the process of estimating the relative motion of a vehicle by using solely image data gathered from the camera. In underwater environments, VO becomes extremely challenging but valuable since ordinary sensors for on-road localization are usually unpractical in these hostile environments. For years, VO methods have been purely based on Computer Vision (CV) principles. However, the recent advances in Deep Learning (DL) have ushered in a new era for VO approaches. These novel methods have achieved impressive performance with state-of-the-art results on urban datasets. Nevertheless, little effort has been made to push learning-based research towards natural environments, such as underwater. Consequently, this work aims to bridge the research gap by evaluating the effectiveness of the learning-based approach in the navigation of Autonomous Underwater Vehicles (AUVs). We compare two learning-based methods with a traditional feature-based method on the Underwater Caves dataset, a very challenging dataset collected in the unstructured environment of an underwater cave complex. Extensive experiments are thus conducted training the models on this dataset. Moreover, we investigate different aspects and propose several improvements, such as sub-sampling the video clips to emphasize the camera motion between consecutive frames, or training exclusively on images with relevant content discarding those with dark borders and representing solely sandy bottoms. Finally, during the training, we also leverage underwater images from other datasets, hence acquired from different cameras. However, the best improvement is obtained by penalizing rotations around the x-axis of the camera coordinate system. The three methods are evaluated on test sequences that cover different lighting conditions. In the most favorable environments, although learning-based methods are not up to par with the feature-based method, the results show great potential. Furthermore, in extreme lighting conditions, where the feature-based baseline sharply fails to bootstrap, one of the two learning-based methods produces instead qualitatively good trajectory results, revealing the power of the learning-based approach in this peculiar context. / Visuell Odometri (VO) används för att uppskatta den relativa rörelsen för ett fordon med hjälp av enbart bilddata från en eller flera kameror. I undervattensmiljöer blir VO extremt utmanande men värdefullt eftersom vanliga sensorer för lokalisering vanligtvis är opraktiska i dessa svåra miljöer. I åratal har VO-metoder enbart baserats på klassisk datorseende. De senaste framstegen inom djupinlärning har dock inlett en ny era för VO-metoder. Dessa nya metoder har uppnått imponerande prestanda på dataset urbana miljöer. Trots detta har ganska lite gjorts för att driva den inlärningsbaserad forskningen mot naturliga miljöer, till exempel under vattnet. Följaktligen syftar detta arbete till att överbrygga forskningsgapet genom att utvärdera effektiviteten hos det inlärningsbaserade tillvägagångssättet vid navigering av autonoma undervattensfordon (AUV). Vi jämför två inlärningsbaserade metoder med en traditionell nyckelpunktsbaserad metod som referens. Vi gör jämförelsen på Underwater Caves-datasetet, ett mycket utmanande dataset som samlats in i den ostrukturerade miljön i ett undervattensgrottkomplex. Omfattande experiment utförs för att träna modellerna på detta dataset. Vi undersöker också olika aspekter och föreslår flera förbättringar, till exempel, att delsampla videoklippen för att betona kamerarörelsen mellan på varandra följande bildrutor, eller att träna på en delmängd av datasetet bestående uteslutande på bilder med relevant innehåll för att förbättra skattningen av rörelsen. Under träningen utnyttjar vi också undervattensbilder från andra datamängder, och därmed från olika kameror. Den bästa förbättringen uppnås dock genom att straffa skattningar av stora rotationer runt kamerakoordinatsystemets x-axel. De tre metoderna utvärderas på testsekvenser som täcker olika ljusförhållanden. I de mest gynnsamma miljöerna visar resultaten stor potential, även om de inlärningsbaserade metoder inte är i nivå med den traditionella referensmetoden. Vid extrema ljusförhållanden, där referensmetoden misslyckas att ens initialisera, ger en av de två inlärningsbaserade metoderna istället kvalitativt bra resultat, vilket demonstrerar kraften i det inlärningsbaserade tillvägagångssättet i detta specifika sammanhang.
|
122 |
Deep Learning-Based Depth Estimation Models with Monocular SLAM : Impacts of Pure Rotational Movements on Scale Drift and RobustnessBladh, Daniel January 2023 (has links)
This thesis explores the integration of deep learning-based depth estimation models with the ORB-SLAM3 framework to address challenges in monocular Simultaneous Localization and Mapping (SLAM), particularly focusing on pure rotational movements. The study investigates the viability of using pre-trained generic depth estimation networks, and hybrid combinations of these networks, to replace traditional depth sensors and improve scale accuracy in SLAM systems. A series of experiments are conducted outdoors, utilizing a custom camera setup designed to isolate pure rotational movements. The analysis involves assessing each model's impact on the SLAM process as well as performance indicators (KPIs) on both depth estimation and 3D tracking. Results indicate a correlation between depth estimation accuracy and SLAM performance, underscoring the potential of depth estimation models in enhancing SLAM systems. The findings contribute to the understanding of the role of monocular depth estimation in integrating with SLAM, especially in applications requiring precise spatial awareness for augmented reality. / Denna avhandling utforskar integrationen av djupinlärningsbaserade modeller för djupuppskattning med ORB-SLAM3-ramverket för att möta utmaningar inom monokulär Samtidig Lokalisering och Kartläggning (SLAM), med särskilt fokus på rena rotationsrörelser. Studien undersöker möjligheten att använda förtränade generiska nätverk för djupuppskattning och hybridkombinationer av dessa nätverk, för att ersätta traditionella djupsensorer och förbättra skalanoggrannheten i SLAM-system. En serie experiment genomförs med användning av en specialbyggd kamerauppställning utformad för att isolera rena rotationsrörelser. Analysen omfattar bedömning av varje modells påverkan på SLAM-processen samt kvantitativa prestandaindikatorer (KPI:er) för både djupuppskattning och följning. Resultaten visar på ett samband mellan noggrannheten i djupuppskattningen och SLAM-prestandan, vilket understryker potentialen hos modeller för djupuppskattning i förbättringen av SLAM-system. Rönen bidrar till förståelsen av rollen som monokulär djupuppskattning har i integrationen med SLAM, särskilt i tillämpningar som kräver exakt spatial medvetenhet.
|
123 |
Monocular Dynamic Motion Capture : A Regression-Optimization Hybrid Approach / Monocular Dynamic Motion Capture : En Regressions-Optimering Hybrid MetodCharisoudis, Athanasios January 2024 (has links)
Recovering 3D human motion from monocular video sequences poses a significant challenge in computer vision, particularly when the camera itself is in motion. The ambiguity introduced by dynamic recording setups necessitates methods to lift camera-local 3D human motions into a consistent, global world frame. This thesis proposes a novel, modular approach to monocular multi-person motion capture, combining regression techniques and global optimization for enhanced accuracy. Our pipeline for 3D motion recovery begins with image-based detection to localize multiple human subjects within each frame. We then fit parametric human body models (SMPL) to estimate the subjects’ 3D poses, resulting in camera-local human pose tracks. To recover camera motion, we implement a visual odometry (VO) algorithm. Next, we port a state-of-the-art global motion regression network to initially lift camera-local motions into a fixed world frame. Finally, we apply a global optimization process guided by re-projection quality, motion realism, and motion smoothness to refine the lifted motion estimates within the global 3D world frame. The core contribution of this thesis is the demonstration of the effectiveness of combining global motion regression with optimization in a chained manner. Ablation studies confirm that this hybrid approach yields superior results compared to the isolated use of either regression or optimization techniques. Our experimental results show that the proposed method achieves performance closely aligned with the state-of-the-art in SMPL-based human motion recovery. / Att återställa mänskliga 3D-rörelser från monokulära videosekvenser utgör en betydande utmaning i datorseende, särskilt när själva kameran är i rörelse. Den tvetydighet som introduceras av dynamiska inspelningsinställningar kräver metoder för att lyfta kameralokala 3D-mänskliga rörelser till en konsekvent global världsram. Denna avhandling föreslår ett nytt, modulärt tillvägagångssätt för monokulär multi-person motion capture, som kombinerar regressionstekniker och global optimering för ökad noggrannhet. Vår pipeline för 3D-rörelseåterställning börjar med bildbaserad detektering för att lokalisera flera mänskliga motiv inom varje bildruta. Vi anpassar sedan parametriska mänskliga kroppsmodeller (SMPL) för att uppskatta motivens 3D-poser, vilket resulterar i kameralokala mänskliga poseringsspår. För att återställa kamerarörelser implementerar vi en visuell odometri (VO) algoritm. Därefter portar vi ett toppmodernt globalt rörelseregressionnätverk för att initialt lyfta kameralokala rörelser till en fast världsram. Slutligen tillämpar vi en global optimeringsprocess som styrs av omprojektionskvalitet, rörelserealism och rörelsejämnhet för att förfina de lyfta rörelseuppskattningarna inom den globala 3D-världsramen. Kärnbidraget i denna avhandling är demonstrationen av effektiviteten av att kombinera global rörelseregression med optimering på ett kedjat sätt. Ablationsstudier bekräftar att denna hybridmetod ger överlägsna resultat jämfört med den isolerade användningen av antingen regression eller optimeringsteknik. Våra experimentella resultat visar att den föreslagna metoden uppnår prestanda som är nära anpassade till det senaste inom SMPL-baserad mänsklig rörelseåterhämtning.
|
124 |
Table tennis event detection and classificationOldham, Kevin M. January 2015 (has links)
It is well understood that multiple video cameras and computer vision (CV) technology can be used in sport for match officiating, statistics and player performance analysis. A review of the literature reveals a number of existing solutions, both commercial and theoretical, within this domain. However, these solutions are expensive and often complex in their installation. The hypothesis for this research states that by considering only changes in ball motion, automatic event classification is achievable with low-cost monocular video recording devices, without the need for 3-dimensional (3D) positional ball data and representation. The focus of this research is a rigorous empirical study of low cost single consumer-grade video camera solutions applied to table tennis, confirming that monocular CV based detected ball location data contains sufficient information to enable key match-play events to be recognised and measured. In total a library of 276 event-based video sequences, using a range of recording hardware, were produced for this research. The research has four key considerations: i) an investigation into an effective recording environment with minimum configuration and calibration, ii) the selection and optimisation of a CV algorithm to detect the ball from the resulting single source video data, iii) validation of the accuracy of the 2-dimensional (2D) CV data for motion change detection, and iv) the data requirements and processing techniques necessary to automatically detect changes in ball motion and match those to match-play events. Throughout the thesis, table tennis has been chosen as the example sport for observational and experimental analysis since it offers a number of specific CV challenges due to the relatively high ball speed (in excess of 100kph) and small ball size (40mm in diameter). Furthermore, the inherent rules of table tennis show potential for a monocular based event classification vision system. As the initial stage, a proposed optimum location and configuration of the single camera is defined. Next, the selection of a CV algorithm is critical in obtaining usable ball motion data. It is shown in this research that segmentation processes vary in their ball detection capabilities and location out-puts, which ultimately affects the ability of automated event detection and decision making solutions. Therefore, a comparison of CV algorithms is necessary to establish confidence in the accuracy of the derived location of the ball. As part of the research, a CV software environment has been developed to allow robust, repeatable and direct comparisons between different CV algorithms. An event based method of evaluating the success of a CV algorithm is proposed. Comparison of CV algorithms is made against the novel Efficacy Metric Set (EMS), producing a measurable Relative Efficacy Index (REI). Within the context of this low cost, single camera ball trajectory and event investigation, experimental results provided show that the Horn-Schunck Optical Flow algorithm, with a REI of 163.5 is the most successful method when compared to a discrete selection of CV detection and extraction techniques gathered from the literature review. Furthermore, evidence based data from the REI also suggests switching to the Canny edge detector (a REI of 186.4) for segmentation of the ball when in close proximity to the net. In addition to and in support of the data generated from the CV software environment, a novel method is presented for producing simultaneous data from 3D marker based recordings, reduced to 2D and compared directly to the CV output to establish comparative time-resolved data for the ball location. It is proposed here that a continuous scale factor, based on the known dimensions of the ball, is incorporated at every frame. Using this method, comparison results show a mean accuracy of 3.01mm when applied to a selection of nineteen video sequences and events. This tolerance is within 10% of the diameter of the ball and accountable by the limits of image resolution. Further experimental results demonstrate the ability to identify a number of match-play events from a monocular image sequence using a combination of the suggested optimum algorithm and ball motion analysis methods. The results show a promising application of 2D based CV processing to match-play event classification with an overall success rate of 95.9%. The majority of failures occur when the ball, during returns and services, is partially occluded by either the player or racket, due to the inherent problem of using a monocular recording device. Finally, the thesis proposes further research and extensions for developing and implementing monocular based CV processing of motion based event analysis and classification in a wider range of applications.
|
125 |
Vers le vol à voile longue distance pour drones autonomes / Towards Vision-Based Autonomous Cross-Country Soaring for UAVsStolle, Martin Tobias 03 April 2017 (has links)
Les petit drones à voilure fixe rendent services aux secteurs de la recherche, de l'armée et de l'industrie, mais souffrent toujours de portée et de charge utile limitées. Le vol thermique permet de réduire la consommation d'énergie. Cependant,sans télédétection d'ascendances, un drone ne peut bénéficier d'une ascendance qu'en la rencontrant par hasard. Dans cette thèse, un nouveau cadre pour le vol à voile longue distance autonome est élaboré, permettant à un drone planeur de localiser visuellement des ascendances sous-cumulus et d’en récolter l'énergie de manière efficace. S'appuyant sur le filtre de Kalman non parfumé, une méthode de vision monoculaire est établie pour l'estimation des paramètres d’ascendances. Sa capacité de fournir des estimations convergentes et cohérentes est évaluée par des simulations Monte Carlo. Les incertitudes de modèle, le bruit de traitement de l'image et les trajectoires de l'observateur peuvent dégrader ces estimés. Par conséquent, un deuxième axe de cette thèse est la conception d'un planificateur de trajectoire robuste basé sur des cartes d'ascendances. Le planificateur fait le compromis entre le temps de vol et le risque d’un atterrissage forcé dans les champs tout en tenant compte des incertitudes d'estimation dans le processus de prise de décision. Il est illustré que la charge de calcul du planificateur de trajectoire proposé est réalisable sur une plate-forme informatique peu coûteuse. Les algorithmes proposés d’estimation ainsi que de planification sont évalués conjointement dans un simulateur de vol à 6 axes, mettant en évidence des améliorations significatives par rapport aux vols à voile longue distance autonomes actuels. / Small fixed-wing Unmanned Aerial Vehicles (UAVs) provide utility to research, military, and industrial sectors at comparablyreasonable cost, but still suffer from both limited operational ranges and payload capacities. Thermal soaring flight for UAVsoffers a significant potential to reduce the energy consumption. However, without remote sensing of updrafts, a glider UAVcan only benefit from an updraft when encountering it by chance. In this thesis, a new framework for autonomous cross-country soaring is elaborated, enabling a glider UAV to visually localize sub-cumulus thermal updrafts and to efficiently gain energy from them.Relying on the Unscented Kalman Filter, a monocular vision-based method is established, for remotely estimatingsub-cumulus updraft parameters. Its capability of providing convergent and consistent state estimates is assessed relyingon Monte Carlo Simulations. Model uncertainties, image processing noise, and poor observer trajectories can degrade theestimated updraft parameters. Therefore, a second focus of this thesis is the design of a robust probabilistic path plannerfor map-based autonomous cross-country soaring. The proposed path planner balances between the flight time and theoutlanding risk by taking into account the estimation uncertainties in the decision making process. The suggested updraftestimation and path planning algorithms are jointly assessed in a 6 Degrees Of Freedom simulator, highlighting significantperformance improvements with respect to state of the art approaches in autonomous cross-country soaring while it is alsoshown that the path planner is implementable on a low-cost computer platform.
|
126 |
Ground Plane Feature Detection in Mobile Vision-Aided Inertial NavigationPanahandeh, Ghazaleh, Mohammadiha, Nasser, Jansson, Magnus January 2012 (has links)
In this paper, a method for determining ground plane features in a sequence of images captured by a mobile camera is presented. The hardware of the mobile system consists of a monocular camera that is mounted on an inertial measurement unit (IMU). An image processing procedure is proposed, first to extract image features and match them across consecutive image frames, and second to detect the ground plane features using a two-step algorithm. In the first step, the planar homography of the ground plane is constructed using an IMU-camera motion estimation approach. The obtained homography constraints are used to detect the most likely ground features in the sequence of images. To reject the remaining outliers, as the second step, a new plane normal vector computation approach is proposed. To obtain the normal vector of the ground plane, only three pairs of corresponding features are used for a general camera transformation. The normal-based computation approach generalizes the existing methods that are developed for specific camera transformations. Experimental results on real data validate the reliability of the proposed method. / <p>QC 20121107</p>
|
127 |
Road Surface Preview Estimation Using a Monocular CameraEkström, Marcus January 2018 (has links)
Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit.
|
128 |
Autonomous navigation and teleoperation of unmanned aerial vehicles using monocular vision / Navigation autonome et télé-opération de véhicules aériens en utilisant la vision monoculaireMercado-Ravell, Diego Alberto 04 December 2015 (has links)
Ce travail porte, de façon théorétique et pratique, sur les sujets plus pertinents autour des drones en navigation autonome et semi-autonome. Conformément à la nature multidisciplinaire des problèmes étudies, une grande diversité des techniques et théories ont été couverts dans les domaines de la robotique, l’automatique, l’informatique, la vision par ordinateur et les systèmes embarques, parmi outres.Dans le cadre de cette thèse, deux plates-formes expérimentales ont été développées afin de valider la théorie proposée pour la navigation autonome d’un drone. Le premier prototype, développé au laboratoire, est un quadrirotor spécialement conçu pour les applications extérieures. La deuxième plate-forme est composée d’un quadrirotor à bas coût du type AR.Drone fabrique par Parrot. Le véhicule est connecté sans fil à une station au sol équipé d’un système d’exploitation pour robots (ROS) et dédié à tester, d’une façon facile, rapide et sécurisé, les algorithmes de vision et les stratégies de commande proposés. Les premiers travaux développés ont été basés sur la fusion de donnés pour estimer la position du drone en utilisant des capteurs inertiels et le GPS. Deux stratégies ont été étudiées et appliquées, le Filtre de Kalman Etendu (EKF) et le filtre à Particules (PF). Les deux approches prennent en compte les mesures bruitées de la position de l’UAV, de sa vitesse et de son orientation. On a réalisé une validation numérique pour tester la performance des algorithmes. Une tâche dans le cahier de cette thèse a été de concevoir d’algorithmes de commande pour le suivi de trajectoires ou bien pour la télé-opération. Pour ce faire, on a proposé une loi de commande basée sur l’approche de Mode Glissants à deuxième ordre. Cette technique de commande permet de suivre au quadrirotor de trajectoires désirées et de réaliser l’évitement des collisions frontales si nécessaire. Etant donné que la plate-forme A.R.Drone est équipée d’un auto-pilote d’attitude, nous avons utilisé les angles désirés de roulis et de tangage comme entrées de commande. L’algorithme de commande proposé donne de la robustesse au système en boucle fermée. De plus, une nouvelle technique de vision monoculaire par ordinateur a été utilisée pour la localisation d’un drone. Les informations visuelles sont fusionnées avec les mesures inertielles du drone pour avoir une bonne estimation de sa position. Cette technique utilise l’algorithme PTAM (localisation parallèle et mapping), qui s’agit d’obtenir un nuage de points caractéristiques dans l’image par rapport à une scène qui servira comme repère. Cet algorithme n’utilise pas de cibles, de marqueurs ou de scènes bien définies. La contribution dans cette méthodologie a été de pouvoir utiliser le nuage de points disperse pour détecter possibles obstacles en face du véhicule. Avec cette information nous avons proposé un algorithme de commande pour réaliser l’évitement d’obstacles. Cette loi de commande utilise les champs de potentiel pour calculer une force de répulsion qui sera appliquée au drone. Des expériences en temps réel ont montré la bonne performance du système proposé. Les résultats antérieurs ont motivé la conception et développement d’un drone capable de réaliser en sécurité l’interaction avec les hommes et les suivre de façon autonome. Un classificateur en cascade du type Haar a été utilisé pour détecter le visage d’une personne. Une fois le visage est détecté, on utilise un filtre de Kalman (KF) pour améliorer la détection et un algorithme pour estimer la position relative du visage. Pour réguler la position du drone et la maintenir à une distance désirée du visage, on a utilisé une loi de commande linéaire. / The present document addresses, theoretically and experimentally, the most relevant topics for Unmanned Aerial Vehicles (UAVs) in autonomous and semi-autonomous navigation. According with the multidisciplinary nature of the studied problems, a wide range of techniques and theories are covered in the fields of robotics, automatic control, computer science, computer vision and embedded systems, among others. As part of this thesis, two different experimental platforms were developed in order to explore and evaluate various theories and techniques of interest for autonomous navigation. The first prototype is a quadrotor specially designed for outdoor applications and was fully developed in our lab. The second testbed is composed by a non expensive commercial quadrotor kind AR. Drone, wireless connected to a ground station equipped with the Robot Operating System (ROS), and specially intended to test computer vision algorithms and automatic control strategies in an easy, fast and safe way. In addition, this work provides a study of data fusion techniques looking to enhance the UAVs pose estimation provided by commonly used sensors. Two strategies are evaluated in particular, an Extended Kalman Filter (EKF) and a Particle Filter (PF). Both estimators are adapted for the system under consideration, taking into account noisy measurements of the UAV position, velocity and orientation. Simulations show the performance of the developed algorithms while adding noise from real GPS (Global Positioning System) measurements. Safe and accurate navigation for either autonomous trajectory tracking or haptic teleoperation of quadrotors is presented as well. A second order Sliding Mode (2-SM) control algorithm is used to track trajectories while avoiding frontal collisions in autonomous flight. The time-scale separation of the translational and rotational dynamics allows us to design position controllers by giving desired references in the roll and pitch angles, which is suitable for quadrotors equipped with an internal attitude controller. The 2-SM control allows adding robustness to the closed-loop system. A Lyapunov based analysis probes the system stability. Vision algorithms are employed to estimate the pose of the vehicle using only a monocular SLAM (Simultaneous Localization and Mapping) fused with inertial measurements. Distance to potential obstacles is detected and computed using the sparse depth map from the vision algorithm. For teleoperation tests, a haptic device is employed to feedback information to the pilot about possible collisions, by exerting opposite forces. The proposed strategies are successfully tested in real-time experiments, using a low-cost commercial quadrotor. Also, conception and development of a Micro Aerial Vehicle (MAV) able to safely interact with human users by following them autonomously, is achieved in the present work. Once a face is detected by means of a Haar cascade classifier, it is tracked applying a Kalman Filter (KF), and an estimation of the relative position with respect to the face is obtained at a high rate. A linear Proportional Derivative (PD) controller regulates the UAV’s position in order to keep a constant distance to the face, employing as well the extra available information from the embedded UAV’s sensors. Several experiments were carried out through different conditions, showing good performance even under disadvantageous scenarios like outdoor flight, being robust against illumination changes, wind perturbations, image noise and the presence of several faces on the same image. Finally, this thesis deals with the problem of implementing a safe and fast transportation system using an UAV kind quadrotor with a cable suspended load. The objective consists in transporting the load from one place to another, in a fast way and with minimum swing in the cable.
|
129 |
3D Object Detection based on Unsupervised Depth EstimationManoharan, Shanmugapriyan 25 January 2022 (has links)
Estimating depth and detection of object instances in 3D space is fundamental in autonomous navigation, localization, and mapping, robotic object manipulation, and
augmented reality. RGB-D images and LiDAR point clouds are the most illustrative formats of depth information. However, depth sensors offer many shortcomings,
such as low effective spatial resolutions and capturing of a scene from a single perspective.
The thesis focuses on reproducing denser and comprehensive 3D scene structure for given monocular RGB images using depth and 3D object detection.
The first contribution of this thesis is the pipeline for the depth estimation based on an unsupervised learning framework. This thesis proposes two architectures to
analyze structure from motion and 3D geometric constraint methods. The proposed architectures trained and evaluated using only RGB images and no ground truth
depth data. The architecture proposed in this thesis achieved better results than the state-of-the-art methods.
The second contribution of this thesis is the application of the estimated depth map, which includes two algorithms: point cloud generation and collision avoidance.
The predicted depth map and RGB image are used to generate the point cloud data using the proposed point cloud algorithm. The collision avoidance algorithm predicts
the possibility of collision and provides the collision warning message based on decoding the color in the estimated depth map. This algorithm design is adaptable
to different color map with slight changes and perceives collision information in the sequence of frames.
Our third contribution is a two-stage pipeline to detect the 3D objects from a monocular image. The first stage pipeline used to detect the 2D objects and crop
the patch of the image and the same provided as the input to the second stage. In the second stage, the 3D regression network train to estimate the 3D bounding boxes
to the target objects. There are two architectures proposed for this 3D regression network model. This approach achieves better average precision than state-of-theart
for truncation of 15% or fully visible objects and lowers but comparable results for truncation more than 30% or partly/fully occluded objects.
|
130 |
Dense 3D Point Cloud Representation of a Scene Using Uncalibrated Monocular VisionDiskin, Yakov 23 May 2013 (has links)
No description available.
|
Page generated in 0.0587 seconds