341 |
Semantic Labeling of Large Geographic Areas Using Multi-Date and Multi-View Satellite Images and Noisy OpenStreetMap LabelsBharath Kumar Comandur Jagannathan Raghunathan (9187466) 31 July 2020 (has links)
<div>This dissertation addresses the problem of how to design a convolutional neural network (CNN) for giving semantic labels to the points on the ground given the satellite image coverage over the area and, for the ground truth, given the noisy labels in OpenStreetMap (OSM). This problem is made challenging by the fact that -- (1) Most of the images are likely to have been recorded from off-nadir viewpoints for the area of interest on the ground; (2) The user-supplied labels in OSM are frequently inaccurate and, not uncommonly, entirely missing; and (3) The size of the area covered on the ground must be large enough to possess any engineering utility. As this dissertation demonstrates, solving this problem requires that we first construct a DSM (Digital Surface Model) from a stereo fusion of the available images, and subsequently use the DSM to map the individual pixels in the satellite images to points on the ground. That creates an association between the pixels in the images and the noisy labels in OSM. The CNN-based solution we present yields a 4-8% improvement in the per-class segmentation IoU (Intersection over Union) scores compared to the traditional approaches that use the views independently of one another. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-`a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. This work also presents, for arguably the first time, an in-depth discussion of large-area image alignment and DSM construction using tens of true multi-date and multi-view WorldView-3 satellite images on a distributed OpenStack cloud computing platform.</div>
|
342 |
Entwicklung und Validierung methodischer Konzepte einer kamerabasierten Durchfahrtshöhenerkennung für NutzfahrzeugeHänert, Stephan 03 July 2020 (has links)
Die vorliegende Arbeit beschäftigt sich mit der Konzeptionierung und Entwicklung eines neuartigen Fahrerassistenzsystems für Nutzfahrzeuge, welches die lichte Höhe von vor dem Fahrzeug befindlichen Hindernissen berechnet und über einen Abgleich mit der einstellbaren Fahrzeughöhe die Passierbarkeit bestimmt. Dabei werden die von einer Monokamera aufgenommenen Bildsequenzen genutzt, um durch indirekte und direkte Rekonstruktionsverfahren ein 3D-Abbild der Fahrumgebung zu erschaffen. Unter Hinzunahme einer Radodometrie-basierten Eigenbewegungsschätzung wird die erstellte 3D-Repräsentation skaliert und eine Prädiktion der longitudinalen und lateralen Fahrzeugbewegung ermittelt. Basierend auf dem vertikalen Höhenplan der Straßenoberfläche, welcher über die Aneinanderreihung mehrerer Ebenen modelliert wird, erfolgt die Klassifizierung des 3D-Raums in Fahruntergrund, Struktur und potentielle Hindernisse.
Die innerhalb des Fahrschlauchs liegenden Hindernisse werden hinsichtlich ihrer Entfernung und Höhe bewertet. Ein daraus abgeleitetes Warnkonzept dient der optisch-akustischen Signalisierung des Hindernisses im Kombiinstrument des Fahrzeugs. Erfolgt keine entsprechende Reaktion durch den Fahrer, so wird bei kritischen Hindernishöhen eine Notbremsung durchgeführt.
Die geschätzte Eigenbewegung und berechneten Hindernisparameter werden mithilfe von Referenzsensorik bewertet. Dabei kommt eine dGPS-gestützte Inertialplattform sowie ein terrestrischer und mobiler Laserscanner zum Einsatz. Im Rahmen der Arbeit werden verschiedene Umgebungssituationen und Hindernistypen im urbanen und ländlichen Raum untersucht und Aussagen zur Genauigkeit und Zuverlässigkeit des Verfahrens getroffen. Ein wesentlicher Einflussfaktor auf die Dichte und Genauigkeit der 3D-Rekonstruktion ist eine gleichmäßige Umgebungsbeleuchtung innerhalb der Bildsequenzaufnahme. Es wird in diesem Zusammenhang zwingend auf den Einsatz einer Automotive-tauglichen Kamera verwiesen. Die durch die Radodometrie bestimmte Eigenbewegung eignet sich im langsamen Geschwindigkeitsbereich zur Skalierung des 3D-Punktraums. Dieser wiederum sollte durch eine Kombination aus indirektem und direktem Punktrekonstruktionsverfahren erstellt werden. Der indirekte Anteil stützt dabei die Initialisierung des Verfahrens zum Start der Funktion und ermöglicht eine robuste Kameraschätzung. Das direkte Verfahren ermöglicht die Rekonstruktion einer hohen Anzahl an 3D-Punkten auf den Hindernisumrissen, welche zumeist die Unterkante beinhalten. Die Unterkante kann in einer Entfernung bis zu 20 m detektiert und verfolgt werden. Der größte Einflussfaktor auf die Genauigkeit der Berechnung der lichten Höhe von Hindernissen ist die Modellierung des Fahruntergrunds. Zur Reduktion von Ausreißern in der Höhenberechnung eignet sich die Stabilisierung des Verfahrens durch die Nutzung von zeitlich vorher zur Verfügung stehenden Berechnungen. Als weitere Maßnahme zur Stabilisierung wird zudem empfohlen die Hindernisausgabe an den Fahrer und den automatischen Notbremsassistenten mittels einer Hysterese zu stützen.
Das hier vorgestellte System eignet sich für Park- und Rangiervorgänge und ist als kostengünstiges Fahrerassistenzsystem interessant für Pkw mit Aufbauten und leichte Nutzfahrzeuge. / The present work deals with the conception and development of a novel advanced driver assistance system for commercial vehicles, which estimates the clearance height of obstacles in front of the vehicle and determines the passability by comparison with the adjustable vehicle height. The image sequences captured by a mono camera are used to create a 3D representation of the driving environment using indirect and direct reconstruction methods. The 3D representation is scaled and a prediction of the longitudinal and lateral movement of the vehicle is determined with the aid of a wheel odometry-based estimation of the vehicle's own movement. Based on the vertical elevation
plan of the road surface, which is modelled by attaching several surfaces together, the 3D space is classified into driving surface, structure and potential obstacles. The obstacles within the predicted driving tube are evaluated with regard to their distance and height. A warning concept derived from this serves to visually and acoustically signal the obstacle in the vehicle's instrument cluster. If the driver does not respond accordingly, emergency braking will be applied at critical obstacle heights. The estimated vehicle movement and calculated obstacle parameters are evaluated with the aid of reference sensors. A dGPS-supported inertial measurement unit and a terrestrial as well as a mobile laser scanner are used. Within the scope of the work, different environmental situations and obstacle types in urban and rural areas are investigated and statements on the accuracy and reliability of the implemented function are made.
A major factor influencing the density and accuracy of 3D reconstruction is uniform ambient lighting within the image sequence. In this context, the use of an automotive camera is mandatory. The inherent motion determined by wheel odometry is suitable for scaling the 3D point space in the slow speed range. The 3D representation however, should be created by a combination of indirect and direct point reconstruction methods. The indirect part supports the initialization phase of the function and enables a robust camera estimation. The direct method enables the reconstruction of a large number of 3D points on the obstacle outlines, which usually contain the lower edge. The lower edge can be detected and tracked up to 20 m away. The biggest factor influencing the accuracy of the calculation of the clearance height of obstacles is the modelling of the driving surface. To reduce outliers in the height calculation, the method can be stabilized by using calculations from older time steps. As a further stabilization measure, it is also recommended to support the obstacle output to the driver and the automatic emergency brake assistant by means of hysteresis. The system presented here is suitable for parking and maneuvering operations and is interesting as a cost-effective driver assistance system for cars with superstructures and light commercial vehicles.
|
343 |
Interactive 3D Reconstruction / Interaktive 3D-RekonstruktionSchöning, Julius 23 May 2018 (has links)
Applicable image-based reconstruction of three-dimensional (3D) objects offers many interesting industrial as well as private use cases, such as augmented reality, reverse engineering, 3D printing and simulation tasks. Unfortunately, image-based 3D reconstruction is not yet applicable to these quite complex tasks, since the resulting 3D models are single, monolithic objects without any division into logical or functional subparts.
This thesis aims at making image-based 3D reconstruction feasible such that captures of standard cameras can be used for creating functional 3D models. The research presented in the following does not focus on the fine-tuning of algorithms to achieve minor improvements, but evaluates the entire processing pipeline of image-based 3D reconstruction and tries to contribute at four critical points, where significant improvement can be achieved by advanced human-computer interaction:
(i) As the starting point of any 3D reconstruction process, the object of interest (OOI) that should be reconstructed needs to be annotated. For this task, novel pixel-accurate OOI annotation as an interactive process is presented, and an appropriate software solution is released. (ii) To improve the interactive annotation process, traditional interface devices, like mouse and keyboard, are supplemented with human sensory data to achieve closer user interaction. (iii) In practice, a major obstacle is the so far missing standard for file formats for annotation, which leads to numerous proprietary solutions. Therefore, a uniform standard file format is implemented and used for prototyping the first gaze-improved computer vision algorithms. As a sideline of this research, analogies between the close interaction of humans and computer vision systems and 3D perception are identified and evaluated. (iv) Finally, to reduce the processing time of the underlying algorithms used for 3D reconstruction, the ability of artificial neural networks to reconstruct 3D models of unknown OOIs is investigated.
Summarizing, the gained improvements show that applicable image-based 3D reconstruction is within reach but nowadays only feasible by supporting human-computer interaction. Two software solutions, one for visual video analytics and one for spare part reconstruction are implemented.
In the future, automated 3D reconstruction that produces functional 3D models can be reached only when algorithms become capable of acquiring semantic knowledge. Until then, the world knowledge provided to the 3D reconstruction pipeline by human computer interaction is indispensable.
|
344 |
In pursuit of consumer-accessible augmented virtuality / En strävan efter konsumenttillgänglig augmented virtualityBerggrén, Rasmus January 2017 (has links)
This project is an examination of the possibility of using existing software to develop Virtual Reality (VR) software that includes key aspects of objects in a user’s surroundings into a virtual environment, producing Augmented Virtuality (AV). A defining limitation is the requirement that the software be consumer-accessible, meaning it needs run on a common smartphone with no additional equipment. Two related AV concepts were considered: shape reconstruction and positional tracking. Two categories of techniques were considered for taking the measurements of reality necessary to achieve those AV concepts using only a monocular RGB camera as sensor: monocular visual SLAM (mvSLAM) and Structure from Motion (SfM). Two lists of requirements were constructed, formalising the notions of AV and consumer-accessibility. A search process was then conducted, where existing software packages were evaluated for their suitability to be included in a piece of software fulfilling all requirements. The evaluations of SfM systems were made in combination with Multi-View Stereo (MVS) systems – a necessary complement for achieving visible shape reconstruction using a system that outputs point clouds. After thoroughly evaluating a variety of software, it was concluded that consumer-accessible AV can not currently be achieved by combining existing packages, due to several issues. While future hardware performance increases and new software implementations would solve complexity and availability issues, some inaccuracy and usability issues are inherent to the limitation of using a monocular camera. / Detta projekt är en undersökning av möjligheten att använda befintlig programvara till att utveckla Virtual Reality (VR)-programvara som infogar framstående aspekter av objekt från en användares omgivning in i en virtuell miljö och därmed skapar Augmented Virtuality (AV). En definierande begränsning är kravet på att programvaran skall vara konsumenttillgänglig, vilket innebär att den behöver kunna köras på en vanlig smartphone utan extra utrustning. Två besläktade AV-koncept beaktades: formrekonstruktion och positionsspårning. Två kategorier av tekniker togs i beaktande, vilka kunde användas för att göra de uppmätningar av verkligheten som var nödvändiga för att uppnå de tänkta AV-koncepten med hjälp av endast en monokulär RGB-kamera som sensor: monocular visual SLAM (mvSLAM) och Structure from Motion (SfM). Två listor med kriterier konstruerades, vilka formaliserade begreppen AV och konsumenttillgänglighet. En sökprocess utfördes sedan, där befintliga programvarupaket utvärderades för sin lämplighet att inkluderas i en programvara som uppfyllde alla kriterier. Utvärderingarna av SfM-system gjordes i kombination med Multi-View Stereo (MVS)-system – ett nödvändigt komplement för att uppnå synlig formrekonstruktion med ett system vars utdata är punktmoln. Efter att noggrant ha utvärderat en mängd programvara var slutsatsen att konsumenttillgänglig AV inte för närvarande kan uppnås genom att kombinera befintliga programvarupaket, på grund av ett antal olika problem. Medan framtida prestandaökningar hos maskinvara och nya programvarutillämpningar skulle lösa problem med komplexitet och tillgänglighet, är vissa problem med tillförlitlighet och användbarhet inneboende hos begränsningen till att använda en monokulär kamera.
|
345 |
Differentiable world programsJatavallabhul, Krishna Murthy 01 1900 (has links)
L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action.
Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies.
Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés.
La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA.
Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond.
Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche.
Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe. / Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action.
However, these advancements have undermined many ``classical'' techniques developed over the last few decades.
We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents.
``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI.
Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning.
Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes.
Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image.
|
346 |
Augmented Reality-Assisted Techniques for Sustainable Lithium-Ion EV Battery Dismantling / Förstärkt Verklighet-Assisterade Teknikers för Hållbar Demontering av LitiumjonbatterierCristina Culincu, Diana January 2023 (has links)
The increasing adoption of electric vehicles (EVs) brings forth the challenge of effectively managing the second-life and end-of-life cycles for lithium-ion batteries. Augmented Reality (AR) offers a promising solution to sustainably and efficiently dismantle these batteries. This thesis explores the development and evaluation of an AR mobile app specifically designed for guiding the dismantling process of a Volkswagen (VW) ID.4 lithium-ion EV battery. Subsequently, a detailed end-to-end development pipeline is presented, spanning from identifying the correct dismantling steps and building complete 3D reconstructions of the ID.4 battery using photogrammetry and CAD or 3D modelling, to creating an AR mobile application in Unity with the help of Vuforia allowing users to visualize the disassembly steps through an interactive guide. Tracking recognition testing results for each model indicates that simpler models exhibit a higher chance of producing false positives, while composite models have a greater minimum recognition distance compared to the faithfulto-real-life one-piece counterparts. User testing is conducted using a hybrid approach, combining a Figma prototype with video recordings to replicate the app’s behavior in a safe environment, without the physical presence of a high voltage battery. Results show positive user feedback, demonstrating the app’s usability and effectiveness in guiding the dismantling process. Furthermore, the thesis evaluates the app’s performance through the System Usability Scale (SUS) and the Technology Acceptance Model. The obtained SUS score of 80 (Grade B - Good) indicates favorable usability, while the Technology Acceptance Model provides insights into potential users’ perceptions. / Den ökande användningen av elektriska fordon (EV) frambringar utmaningen att effektivt hantera andra livscykler och slutlivscykler för litiumjonbatterier. För att hållbart och effektivt demontera dessa batterier erbjuder Augmented Reality (AR) en lovande lösning. Denna uppsats utforskar utvecklingen och utvärderingen av en AR-mobilapplikation som specifikt är utformad för att guida demonteringsprocessen av ett Volkswagen (VW) ID.4 litiumjon EVbatteri. Därefter presenteras en detaljerad genomgående utvecklingsprocess, som sträcker sig från att identifiera korrekta demonteringssteg och skapa kompletta 3D-rekonstruktioner av ID.4-batteriet med hjälp av fotogrammetri och CAD eller 3D-modellering, till att skapa en AR-mobilapplikation i Unity med hjälp av Vuforia, som tillåter användare att visualisera demonteringsstegen genom en interaktiv guide. Resultaten bättre identifieringstester för varje modell indikerar att enklare modeller har större chans att producera falska positiva resultat, medan komplexa modeller har större minsta igenkänningsavstånd jämfört med helhetsmodeller som är trogna verkligheten. Användartester genomförs med hjälp av en hybridmetod som kombinerar en Figma-prototyp med videoinspelningar för att återskapa appens beteende i en säker miljö, utan att behöva ha ett högspänningsbatteri fysiskt närvarande. Resultaten visar positivt användarfeedback och bekräftar appens användarvänlighet och effektivitet vid guidning av demonteringsprocessen. Uppsatsen utvärderar också appens prestanda genom System Usability Scale (SUS) och Technology Acceptance Model. Den erhållna SUS-poängen på 80 (Betyg B - Bra) indikerar en god användbarhet, medan Technology Acceptance Model ger insikter om potentiella användares uppfattningar.
|
347 |
Segmentation des images radiographiques à rayon-X basée sur la fusion entropique et Reconstruction 3D biplanaire des os basée sur la modélisation statistique non-linéaireNguyen, Dac Cong Tai 08 1900 (has links)
Dans cette thèse, nous présentons une méthode de segmentation d’images radiographiques des membres inférieurs en régions d’intérêt (ROIs), une méthode de recalage rigide tridimensionnel (3D) / bidimensionnel (2D) des prothèses du genou sur les deux images biplanaires radiographiques calibrées et une méthode de reconstruction 3D des membres inférieurs à
partir de deux images biplanaires radiographiques calibrées.
Le premier article présente une méthode de segmentation de rotule, astragale et bassin des images radiographiques en régions d’intérêt basée sur la fusion de multi-atlas et superpixels. Cette méthode utilise l’apprentissage d’une base de données d’images radiographiques de ces os segmentées manuellement et recalées entre elles pour estimer un ensemble de superpixels permettant de tenir compte de toute la variabilité locale et non linéaire existante dans la base, puis la propagation d’étiquettes basée sur le concept d’entropie pour raffiner la carte de segmentations en régions internes afin d’obtenir le résultat final.
Le deuxième article présente une méthode de recalage rigide 3D / 2D des composants tibiaux et fémoraux de prothèse du genou sur deux images biplanaires radiographiques calibrées. Cette méthode utilise une mesure de similarité hybride basée sur les notions de contours et régions puis un algorithme d’optimisation stochastique pour estimer la position des composants. La similarité basée sur les régions est stable et robuste contre les bruits. Cependant, cette mesure n’est pas précise car le nombre de pixels aux contours est inférieur au celui à l’intérieur de la région. Au contraire, la similarité basée sur les contours est précise mais plus sensible au bruit ou à d’autres artefacts existant dans les images. C’est pourquoi la combinaison de ces deux similarités fournit une méthode de recalage robuste et précise.
Le troisième article représente une méthode statistique biplanaire de reconstruction 3D de rotule, astragale et bassin. Cette méthode utilise un algorithme de réduction de dimensionnalité pour définir un modèle déformable paramétrique qui contient toutes les déformations statistiques admissibles apprises à partir d’une base de données des structures osseuses. Puis
un algorithme d’optimisation stochastique est utilisé pour minimiser la différence entre la projection des contours / régions des modèles surfaciques osseux avec ceux segmentés sur les deux images radiographiques. / In this thesis, we present a segmentation method of lower limbs of X-ray images into regions of interest (ROIs), a three-dimensional (3D) / two-dimensional (2D) rigid registration method of knee implant components to biplanar X-ray images, and a 3D reconstruction method of the lower limbs using biplanar X-ray images.
The first paper presents a superpixel and multi-atlas-based segmentation method of the patella, talus, and pelvis into regions of interest. This method uses a training dataset of pre-segmented and co-registered X-ray images of these bones to estimate a collection of superpixels allowing to take into account all the nonlinear and local variability existing in the dataset, then a propagation of label based on the entropy concept for refining the segmentation map into internal regions to the final result.
The second paper presents a 3D / 2D rigid registration method of tibial and femoral components of knee implants to calibrated biplanar X-ray images. This method uses a hybrid edge- and region-based similarity measure then a stochastic optimization algorithm to estimate the component position. The region-based similarity is stable and robust to noise. However, this measure is not precise because the number of pixels in the border is fewer than the number of pixels inside the region. On the contrary, the edge-based similarity is accurate but more sensitive to noise or other artifacts existing in the images. That’s why the combination of these two similarity types provides a robust and accurate registration method.
The third paper presents a statistical biplanar 3D reconstruction method of the patella, talus, and pelvis. This method uses a dimensionality reduction algorithm to define a deformable parametric model which contains all admissible statistical deformations learned from the bone structure dataset. Then a stochastic optimization algorithm is used to minimize the difference between the contour / region projection of bone models and the contours / regions in two segmented X-ray images.
|
Page generated in 0.0374 seconds