Spelling suggestions: "subject:"3structure anda motion"" "subject:"3structure ando motion""
121 |
Jämförelse av punktmoln genererade med terrester laserskanner och drönar-baserad Structure-from-Motion fotogrammetri : En studie om osäkerhet och kvalitet vid detaljmätning och 3D-modellering / Comparison of Point Clouds Generated by Terrestrial Laser Scanning and Structure-from-Motion Photogrammetry with UAVs : A study on uncertainty and quality in detailed measurement and 3D modelingNyberg, Emil, Wolski, Alexander January 2024 (has links)
Fotogrammetri är en viktig metod för att skapa 3D-representationer av terräng och strukturer, men utmaningar kvarstår när det gäller noggrannheten på grund av faktorer som bildkvalitet, kamerakalibrering och positionsdata. Användningen av drönare för byggnadsdetaljmätning möjliggör snabb och kostnadseffektiv datainsamling, men noggrannheten kan påverkas av bildkvalitet och skuggning. Avhandlingen syftar till att jämföra noggrannheten och kvaliteten hos punktmoln genererade med två olika tekniker: terrester laserskanning (TLS) och struktur-från-rörelse (SfM) fotogrammetri med drönare. För att testa båda metodernas osäkerhet och noggrannhet vid detaljmätning av bostäder. Genom att utföra mätningar på en villa har data samlats in med både TLS och drönare utrustade med 48 MP kamera, samt georeferering med markstöd (GCP). SfM-punktmoln bearbetades med Agisoft Metashape. Jämförelser gjordes mellan SfM- och TLS-punktmoln avseende täckning, lägesskillnad och lägesosäkerhet. Genom att följa riktlinjer från HMK - Terrester Laserskanning och tillämpa HMK Standardnivå 3 säkerställs hög noggrannhet i mätningarna. Kontroll av lägesosäkerhet av båda punktmolnen resulterade i en lägesosäkerhet som understeg toleranser satta enligt HMK - Terrester laserskanner Standardnivå 3. Kontrollen av lägesosäkerheten visade att kvadratiska medelfelet(RMSE) i plan och höjd var 0.011m respektive 0.007m för TLS-punktmolnet, och 0.02m respektive 0.015m för drönar-SfM-punktmolnet, vilket låg under toleransen enligt HMK- Terrester Detaljmätning 2021. Resultaten tyder på att Structure-from-Motion fotogrammetri med drönare kan generera punktmoln med god detaljrikedom, inte lika noggrann som med terrester laserskanner på sin lägsta inställning. TLS uppvisade mindre osäkerhet enligt kontrollen av lägesosäkerhet, ungefär en halvering av RMSE i både plan och höjd. I studien framgick det att TLS presterar sämre vid svåråtkomliga ytor med skymd sikt och ogynnsamma infallsvinklar, där effekten blir en lägre punkttäthet för punktmolnet. Vid gynnsamma förhållanden erbjuder TLS en högre noggrannhet och detaljnivå jämfört med SfM punktmoln. Enligt M3C2 punktmoln analys, med TLS punktmolnet som referens, antydde det att SfM punktmolnet genererade största felen vid takfot samt vid buskage. De större felen vid takfot tyder på att SfM presterar sämre gällande detaljnivå och fel vid buskageområdet varierar inte från det som dokumenterats om fotogrammetriska fel vid mappning av vegetation. SfM kan utföra en effektiv datainsamling för större samt svåråtkomliga ytor men kräver lång bearbetningstid med diverse hjälpmedel för att uppnå hög noggrannhet. TLS kräver istället en lång datainsamlingsprocess men kan generera ett detaljerat och noggrant punktmoln direkt utan långa bearbetningsprocesser. Val av metod styrs därmed baserat på specifika projektkrav. Långsiktiga implikationer inkluderar förbättrad effektivitet och säkerhet inom bygg- och anläggningsprojekt, samt potentialen för kostnadsbesparingar och mer detaljerade inspektioner. / Photogrammetry is a crucial method for creating 3D representations of terrain and structures, yet challenges remain regarding accuracy due to factors such as image quality, camera calibration, and positional data. The use of drones for building detail measurements enables rapid and cost-effective data collection, but accuracy can be affected by image quality and shading. This thesis aims to compare the accuracy and quality of point clouds generated using two different techniques: terrestrial laser scanning (TLS) and Structure-from-Motion (SfM) photogrammetry with drones. The objective is to test the uncertainty and accuracy of both methods in residential surveying. Data collection was performed on a villa using both TLS and a drone equipped with a 48 MP camera, along with georeferencing with ground control points (GCP). SfM point clouds were processed with Agisoft Metashape. Comparisons were made between SfM and TLS point clouds in terms of coverage, positional difference, and positional uncertainty. By following guidelines from HMK - Terrester laserskanning 2021 and applying HMK Standard Level 3, high measurement accuracy was ensured. Positional uncertainty checks of both point clouds resulted in positional uncertainty within tolerances set by HMK - Terrestrial Laser Scanning Standard Level 3. The positional uncertainty, with a sample of 41 points showed that the root mean square error (RMSE) in plane and height was 0.011m and 0.007m respectively for the TLS point cloud, and 0.02m and 0.015m for the drone-SfM point cloud, both within the tolerance according to HMK - Terrestrial Detail Measurement 2021. The results suggest that Structure-from-Motion photogrammetry with drones can generate point clouds with good detail, although not as accurate as terrestrial laser scanning at its lowest setting. TLS showed less uncertainty according to the positional uncertainty check, with approximately half the RMSE in both plan and height. The study found that TLS performs worse on difficult-to-access surfaces with obstructed views and unfavorable angles, resulting in lower point cloud density. Under favorable conditions, TLS offers higher accuracy and detail compared to SfM point clouds. According to M3C2 point cloud analysis, using the TLS point cloud as a reference, SfM point clouds showed the largest errors at eaves and shrubbery. The larger errors at eaves indicate that SfM performs worse in terms of detail level, and errors in the shrubbery area are consistent with documented photogrammetric errors in vegetation mapping. SfM can effectively collect data for larger and difficult-to-access areas but requires extensive processing time with various aids to achieve high accuracy. Conversely, TLS requires a long data collection process but can generate a detailed and accurate point cloud directly without lengthy processing. The choice of method thus depends on specific project requirements. Long-term implications include improved efficiency and safety in construction and infrastructure projects, as well as potential cost savings and more detailed inspections.
|
122 |
Contributions au recalage et à la reconstruction 3D de surfaces déformablesGay-Bellile, Vincent 10 November 2008 (has links) (PDF)
Cette thèse porte sur le développement d'outils permettant le recalage d'images d'une surface déformable et la reconstruction tridimensionnelle de surfaces déformables à partir d'images prises par une seule caméra. Les surfaces que nous souhaitons traiter sont typiquement un visage ou une feuille de papier. Ces problématiques sont mal posées lorsque seule l'information présente dans les images est exploitée. Des informations a priori sur les déformations physiquement admissibles de la surface observée doivent être définies. Elles diffèrent en fonction du problème étudié. Par exemple, pour une feuille de papier, la courbure Gaussienne évaluée en chacun de ces points est nulle, cette propriété n'est pas valide pour un visage. Les applications visées sont l'insertion réaliste de logo 2D, de texte et aussi d'objets virtuels 3D dans des vidéos présentant une surface déformable. La première partie de cette thèse est consacrée au recalage d'images par modèles déformables. Après avoir brièvement introduit les notions de base sur les fonctions de déformation et sur leur estimation à partir de données images, nous donnons deux contributions. La première est un algorithme de recalage d'images d'une surface déformable, qui est efficace en terme de temps de calcul. Nous proposons une paramétrisation par primitives des fonctions de déformation permettant alors leur estimation par des algorithmes compositionnels habituellement réservés aux transformations formant un groupe. La deuxième contribution est la modélisation explicite des auto-occultations, en imposant la contraction de la fonction de déformation le long de la frontière d'auto-occultation. La deuxième partie de cette thèse aborde le problème de la reconstruction tridimensionnelle monoculaire de surfaces déformables. Nous nous basons sur le modèle de faible rang : les déformations sont approximées par une combinaison linéaire de modes de déformation inconnus. Nous supposons que ces derniers sont ordonnés par importance en terme d'amplitude de déformation capturée dans les images. Il en résulte une estimation hiérarchique des modes, facilitant l'emploi d'un modèle de caméra perspectif, la sélection automatique du nombre de modes et réduisant certaines ambiguïtés inhérentes au modèle. Nous explorons finalement la capture des déformations d'une surface peu texturée à partir de données issues d'un capteur 3D. L'information présente au niveau des contours de la surface est notamment utilisée. Nous avons implanté les différentes contributions décrites ci-dessous. Elles sont testées et comparées à l'état de l'art sur des données réelles et synthétiques. Les résultats sont présentés tout au long du tapuscrit.
|
123 |
SLAM temporel à contraintes multiples / Multiple constraints and temporal SLAMRamadasan, Datta 15 December 2015 (has links)
Ce mémoire décrit mes travaux de thèse de doctorat menés au sein de l’équipe ComSee (Computers that See) rattachée à l’axe ISPR (Image, Systèmes de Perception et Robotique) de l’Institut Pascal. Celle-ci a été financée par la Région Auvergne et le Fonds Européen de Développement Régional. Les travaux présentés s’inscrivent dans le cadre d’applications de localisation pour la robotique mobile et la Réalité Augmentée. Le framework réalisé au cours de cette thèse est une approche générique pour l’implémentation d’applications de SLAM : Simultaneous Localization And Mapping (algorithme de localisation par rapport à un modèle simultanément reconstruit). L’approche intègre une multitude de contraintes dans les processus de localisation et de reconstruction. Ces contraintes proviennent de données capteurs mais également d’a priori liés au contexte applicatif. Chaque contrainte est utilisée au sein d’un même algorithme d’optimisation afin d’améliorer l’estimation du mouvement ainsi que la précision du modèle reconstruit. Trois problèmes ont été abordés au cours de ce travail. Le premier concerne l’utilisation de contraintes sur le modèle reconstruit pour l’estimation précise d’objets 3D partiellement connus et présents dans l’environnement. La seconde problématique traite de la fusion de données multi-capteurs, donc hétérogènes et asynchrones, en utilisant un unique algorithme d’optimisation. La dernière problématique concerne la génération automatique et efficace d’algorithmes d’optimisation à contraintes multiples. L’objectif est de proposer une solution temps réel 1 aux problèmes de SLAM à contraintes multiples. Une approche générique est utilisée pour concevoir le framework afin de gérer une multitude de configurations liées aux différentes contraintes des problèmes de SLAM. Un intérêt tout particulier a été porté à la faible consommation de ressources (mémoire et CPU) tout en conservant une grande portabilité. De plus, la méta-programmation est utilisée pour générer automatiquement et spécifiquement les parties les plus complexes du code en fonction du problème à résoudre. La bibliothèque d’optimisation LMA qui a été développée au cours de cette thèse est mise à disposition de la communauté en open-source. Des expérimentations sont présentées à la fois sur des données de synthèse et des données réelles. Un comparatif exhaustif met en évidence les performances de la bibliothèque LMA face aux alternatives les plus utilisées de l’état de l’art. De plus, le framework de SLAM est utilisé sur des problèmes impliquant une difficulté et une quantité de contraintes croissantes. Les applications de robotique mobile et de Réalité Augmentée mettent en évidence des performances temps réel et un niveau de précision qui croît avec le nombre de contraintes utilisées. / This report describes my thesis work conducted within the ComSee (Computers That See) team related to the ISPR axis (ImageS, Perception Systems and Robotics) of Institut Pascal. It was financed by the Auvergne Région and the European Fund of Regional Development. The thesis was motivated by localization issues related to Augmented Reality and autonomous navigation. The framework developed during this thesis is a generic approach to implement SLAM algorithms : Simultaneous Localization And Mapping. The proposed approach use multiple constraints in the localization and mapping processes. Those constraints come from sensors data and also from knowledge given by the application context. Each constraint is used into one optimization algorithm in order to improve the estimation of the motion and the accuracy of the map. Three problems have been tackled. The first deals with constraints on the map to accurately estimate the pose of 3D objects partially known in the environment. The second problem is about merging multiple heterogeneous and asynchronous data coming from different sensors using an optimization algorithm. The last problem is to write an efficient and real-time implementation of the SLAM problem using multiple constraints. A generic approach is used to design the framework and to generate different configurations, according to the constraints, of each SLAM problem. A particular interest has been put in the low computational requirement (in term of memory and CPU) while offering a high portability. Moreover, meta-programming techniques have been used to automatically and specifically generate the more complex parts of the code according to the given problem. The optimization library LMA, developed during this thesis, is made available of the community in open-source. Several experiments were done on synthesis and real data. An exhaustive benchmark shows the performances of the LMA library compared to the most used alternatives of the state of the art. Moreover, the SLAM framework is used on different problems with an increasing difficulty and amount of constraints. Augmented Reality and autonomous navigation applications show the good performances and accuracies in multiple constraints context.
|
124 |
Manifold clustering for motion segmentationZappella, Luca 30 June 2011 (has links)
En aquesta tesi s’estudia el problema de la segmentació del moviment. La tesi presenta una revisió dels principals algoritmes de segmentació del moviment, s’analitzen les característiques principals i es proposa una classificació de les tècniques més recents i importants. La segmentació es pot entendre com un problema d’agrupament d’espais (manifold clustering). Aquest estudi aborda alguns dels reptes més difícils de la segmentació de moviment a través l’agrupament d’espais. S’han proposat nous algoritmes per a l’estimació del rang de la matriu de trajectòries, s’ha presenta una mesura de similitud entre subespais, s’han abordat problemes relacionats amb el comportament dels angles canònics i s’ha desenvolupat una eina genèrica per estimar quants moviments apareixen en una seqüència. L´ultima part de l’estudi es dedica a la correcció de l’estimació inicial d’una segmentació. Aquesta correcció es du a terme ajuntant els problemes de la segmentació del moviment i de l’estructura a partir del moviment. / IN THIS STUDY THE PROBLEM OF MOTION SEGMENTATION IS DISCUSSED. MOTION SEGMENTATION STATE OF THE ART IS PRESENTED, THE MAIN FEATURES OF MOTION SEGMENTATION ALGORITHMS ARE ANALYSED, AND A CLASSIFICATION OF THE RECENT AND MOST IMPORTANT TECHNIQUES IS PROPOSED. THE SEGMENTATION PROBLEM COULD BE CAST INTO A MANIFOLD CLUSTERING PROBLEM. IN THIS STUDY SOME OF THE MOST CHALLENGING ISSUES RELATED TO MOTION SEGMENTATION VIA MANIFOLD CLUSTERING ARE TACKLED. NEW ALGORITHMS FOR THE RANK ESTIMATION OF THE TRAJECTORY MATRIX ARE PROPOSED. A MEASURE OF SIMILARITY BETWEEN SUBSPACES IS PRESENTED. THE BEHAVIOUR OF PRINCIPAL ANGLES IS DISCUSSED. A GENERIC TOOL FOR THE ESTIMATION OF THE NUMBER OF MOTIONS IS DEVELOPED. THE LAST PART OF THE STUDY IS DEDICATED TO THE DEVELOPMENT OF AN ALGORITHM FOR THE CORRECTION OF AN INITIAL MOTION SEGMENTATION SOLUTION. SUCH A CORRECTION IS ACHIEVED BY BRINGING TOGETHER THE PROBLEMS OF MOTION SEGMENTATION AND STRUCTURE FROM MOTION.
|
125 |
Bearing-only SLAM : a vision-based navigation system for autonomous robotsHuang, Henry January 2008 (has links)
To navigate successfully in a previously unexplored environment, a mobile robot must be able to estimate the spatial relationships of the objects of interest accurately. A Simultaneous Localization and Mapping (SLAM) sys- tem employs its sensors to build incrementally a map of its surroundings and to localize itself in the map simultaneously. The aim of this research project is to develop a SLAM system suitable for self propelled household lawnmowers. The proposed bearing-only SLAM system requires only an omnidirec- tional camera and some inexpensive landmarks. The main advantage of an omnidirectional camera is the panoramic view of all the landmarks in the scene. Placing landmarks in a lawn field to define the working domain is much easier and more flexible than installing the perimeter wire required by existing autonomous lawnmowers. The common approach of existing bearing-only SLAM methods relies on a motion model for predicting the robot’s pose and a sensor model for updating the pose. In the motion model, the error on the estimates of object positions is cumulated due mainly to the wheel slippage. Quantifying accu- rately the uncertainty of object positions is a fundamental requirement. In bearing-only SLAM, the Probability Density Function (PDF) of landmark position should be uniform along the observed bearing. Existing methods that approximate the PDF with a Gaussian estimation do not satisfy this uniformity requirement. This thesis introduces both geometric and proba- bilistic methods to address the above problems. The main novel contribu- tions of this thesis are: 1. A bearing-only SLAM method not requiring odometry. The proposed method relies solely on the sensor model (landmark bearings only) without relying on the motion model (odometry). The uncertainty of the estimated landmark positions depends on the vision error only, instead of the combination of both odometry and vision errors. 2. The transformation of the spatial uncertainty of objects. This thesis introduces a novel method for translating the spatial un- certainty of objects estimated from a moving frame attached to the robot into the global frame attached to the static landmarks in the environment. 3. The characterization of an improved PDF for representing landmark position in bearing-only SLAM. The proposed PDF is expressed in polar coordinates, and the marginal probability on range is constrained to be uniform. Compared to the PDF estimated from a mixture of Gaussians, the PDF developed here has far fewer parameters and can be easily adopted in a probabilistic framework, such as a particle filtering system. The main advantages of our proposed bearing-only SLAM system are its lower production cost and flexibility of use. The proposed system can be adopted in other domestic robots as well, such as vacuum cleaners or robotic toys when terrain is essentially 2D.
|
126 |
Road Surface Preview Estimation Using a Monocular CameraEkström, Marcus January 2018 (has links)
Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit.
|
127 |
Automatic Volume Estimation Using Structure-from-Motion Fused with a Cellphone's Inertial SensorsFallqvist, Marcus January 2017 (has links)
The thesis work evaluates a method to estimate the volume of stone and gravelpiles using only a cellphone to collect video and sensor data from the gyroscopesand accelerometers. The project is commissioned by Escenda Engineering withthe motivation to replace more complex and resource demanding systems with acheaper and easy to use handheld device. The implementation features popularcomputer vision methods such as KLT-tracking, Structure-from-Motion, SpaceCarving together with some Sensor Fusion. The results imply that it is possible toestimate volumes up to a certain accuracy which is limited by the sensor qualityand with a bias. / I rapporten framgår hur volymen av storskaliga objekt, nämligen grus-och stenhögar,kan bestämmas i utomhusmiljö med hjälp av en mobiltelefons kamerasamt interna sensorer som gyroskop och accelerometer. Projektet är beställt avEscenda Engineering med motivering att ersätta mer komplexa och resurskrävandesystem med ett enkelt handhållet instrument. Implementationen använderbland annat de vanligt förekommande datorseendemetoderna Kanade-Lucas-Tommasi-punktspårning, Struktur-från-rörelse och 3D-karvning tillsammans medenklare sensorfusion. I rapporten framgår att volymestimering är möjligt mennoggrannheten begränsas av sensorkvalitet och en bias.
|
128 |
Reducing Energy Consumption Through Image Compression / Reducera energiförbrukning genom bildkompressionFerdeen, Mats January 2016 (has links)
The energy consumption to make the off-chip memory writing and readings are aknown problem. In the image processing field structure from motion simpler compressiontechniques could be used to save energy. A balance between the detected features suchas corners, edges, etc., and the degree of compression becomes a big issue to investigate.In this thesis a deeper study of this balance are performed. A number of more advancedcompression algorithms for processing of still images such as JPEG is used for comparisonwith a selected number of simpler compression algorithms. The simpler algorithms canbe divided into two categories: individual block-wise compression of each image andcompression with respect to all pixels in each image. In this study the image sequences arein grayscale and provided from an earlier study about rolling shutters. Synthetic data setsfrom a further study about optical flow is also included to see how reliable the other datasets are. / Energikonsumtionen för att skriva och läsa till off-chip minne är ett känt problem. Inombildbehandlingsområdet struktur från rörelse kan enklare kompressionstekniker användasför att spara energi. En avvägning mellan detekterade features såsom hörn, kanter, etc.och grad av kompression blir då en fråga att utreda. I detta examensarbete har en djuparestudie av denna avvägning utförts. Ett antal mer avancerade kompressionsalgoritmer förbearbetning av stillbilder som tex. JPEG används för jämförelse med ett antal utvaldaenklare kompressionsalgoritmer. De enklare algoritmerna kan delas in i två kategorier:individuell blockvis kompression av vardera bilden och kompression med hänsyn tillsamtliga pixlar i vardera bilden. I studien är bildsekvenserna i gråskala och tillhandahållnafrån en tidigare studie om rullande slutare. Syntetiska data set från ytterligare en studie om’optical flow’ ingår även för att se hur pass tillförlitliga de andra dataseten är.
|
129 |
3D Object Detection based on Unsupervised Depth EstimationManoharan, Shanmugapriyan 25 January 2022 (has links)
Estimating depth and detection of object instances in 3D space is fundamental in autonomous navigation, localization, and mapping, robotic object manipulation, and
augmented reality. RGB-D images and LiDAR point clouds are the most illustrative formats of depth information. However, depth sensors offer many shortcomings,
such as low effective spatial resolutions and capturing of a scene from a single perspective.
The thesis focuses on reproducing denser and comprehensive 3D scene structure for given monocular RGB images using depth and 3D object detection.
The first contribution of this thesis is the pipeline for the depth estimation based on an unsupervised learning framework. This thesis proposes two architectures to
analyze structure from motion and 3D geometric constraint methods. The proposed architectures trained and evaluated using only RGB images and no ground truth
depth data. The architecture proposed in this thesis achieved better results than the state-of-the-art methods.
The second contribution of this thesis is the application of the estimated depth map, which includes two algorithms: point cloud generation and collision avoidance.
The predicted depth map and RGB image are used to generate the point cloud data using the proposed point cloud algorithm. The collision avoidance algorithm predicts
the possibility of collision and provides the collision warning message based on decoding the color in the estimated depth map. This algorithm design is adaptable
to different color map with slight changes and perceives collision information in the sequence of frames.
Our third contribution is a two-stage pipeline to detect the 3D objects from a monocular image. The first stage pipeline used to detect the 2D objects and crop
the patch of the image and the same provided as the input to the second stage. In the second stage, the 3D regression network train to estimate the 3D bounding boxes
to the target objects. There are two architectures proposed for this 3D regression network model. This approach achieves better average precision than state-of-theart
for truncation of 15% or fully visible objects and lowers but comparable results for truncation more than 30% or partly/fully occluded objects.
|
130 |
Registration and Localization of Unknown Moving Objects in Markerless Monocular SLAMBlake Austin Troutman (15305962) 18 May 2023 (has links)
<p>Simultaneous localization and mapping (SLAM) is a general device localization technique that uses realtime sensor measurements to develop a virtualization of the sensor's environment while also using this growing virtualization to determine the position and orientation of the sensor. This is useful for augmented reality (AR), in which a user looks through a head-mounted display (HMD) or viewfinder to see virtual components integrated into the real world. Visual SLAM (i.e., SLAM in which the sensor is an optical camera) is used in AR to determine the exact device/headset movement so that the virtual components can be accurately redrawn to the screen, matching the perceived motion of the world around the user as the user moves the device/headset. However, many potential AR applications may need access to more than device localization data in order to be useful; they may need to leverage environment data as well. Additionally, most SLAM solutions make the naive assumption that the environment surrounding the system is completely static (non-moving). Given these circumstances, it is clear that AR may benefit substantially from utilizing a SLAM solution that detects objects that move in the scene and ultimately provides localization data for each of these objects. This problem is known as the dynamic SLAM problem. Current attempts to address the dynamic SLAM problem often use machine learning to develop models that identify the parts of the camera image that belong to one of many classes of potentially-moving objects. The limitation with these approaches is that it is impractical to train models to identify every possible object that moves; additionally, some potentially-moving objects may be static in the scene, which these approaches often do not account for. Some other attempts to address the dynamic SLAM problem also localize the moving objects they detect, but these systems almost always rely on depth sensors or stereo camera configurations, which have significant limitations in real-world use cases. This dissertation presents a novel approach for registering and localizing unknown moving objects in the context of markerless, monocular, keyframe-based SLAM with no required prior information about object structure, appearance, or existence. This work also details a novel deep learning solution for determining SLAM map initialization suitability in structure-from-motion-based initialization approaches. This dissertation goes on to validate these approaches by implementing them in a markerless, monocular SLAM system called LUMO-SLAM, which is built from the ground up to demonstrate this approach to unknown moving object registration and localization. Results are collected for the LUMO-SLAM system, which address the accuracy of its camera localization estimates, the accuracy of its moving object localization estimates, and the consistency with which it registers moving objects in the scene. These results show that this solution to the dynamic SLAM problem, though it does not act as a practical solution for all use cases, has an ability to accurately register and localize unknown moving objects in such a way that makes it useful for some applications of AR without thwarting the system's ability to also perform accurate camera localization.</p>
|
Page generated in 0.091 seconds