1 |
Active Stereo Reconstruction using Deep LearningKihlström, Helena January 2019 (has links)
Depth estimation using stereo images is an important task in many computer vision applications. A stereo camera contains two image sensors that observe the scene from slightly different viewpoints, making it possible to find the depth of the scene. An active stereo camera also uses a laser projector that projects a pattern into the scene. The advantage of the laser pattern is the additional texture that gives better depth estimations in dark and textureless areas. Recently, deep learning methods have provided new solutions producing state-of-the-art performance in stereo reconstruction. The aim of this project was to investigate the behavior of a deep learning model for active stereo reconstruction, when using data from different cameras. The model is self-supervised, which solves the problem of having enough ground truth data for training the model. It instead uses the known relationship between the left and right images to let the model learn the best estimation. The model was separately trained on datasets from three different active stereo cameras. The three trained models were then compared using evaluation images from all three cameras. The results showed that the model did not always perform better on images from the camera that was used for collecting the training data. However, when comparing the results of different models using the same test images, the model that was trained on images from the camera used for testing gave better results in most cases.
|
2 |
Real-time stereo reconstruction using hierarchical dynamic programming and LULU filteringSingels, Francois 03 1900 (has links)
Thesis (MSc (Mathematics))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: In this thesis we consider the essential topics relating to stereo-vision and the correspondence
problem in general. The aim is to reconstruct a dense 3D scene from
images captured by two spatially related cameras. Our main focus, however, is on
speed and real-time implementation on a standard desktop PC. We wish to use
the CPU to solve the correspondence problem and to reserve the GPU for model
rendering. We discuss three fundamental types of algorithms and evaluate their
suitability to this end. We eventually choose to implement a hierarchical version of
the dynamic programming algorithm, because of the good balance between accuracy
and speed. As we build our system from the ground up we gradually introduce
necessary concepts and established geometric principles, common to most stereovision
systems, and discuss them as they become relevant. It becomes clear that the
greatest weakness of the hierarchical dynamic programming algorithm is scanline
inconsistency. We nd that the one-dimensional LULU- lter is computationally inexpensive
and e ective at removing outliers when applied across the scanlines. We
take advantage of the hierarchical structure of our algorithm and sub-pixel re nement
to produce results at video rates (roughly 20 frames per second). A 3D model
is also constructed at video rates in an on-line system with only a small delay between
obtaining the input images and rendering the model. Not only is the quality
of our results highly competitive with those of other state of the art algorithms, but
the achievable speed is also considerably faster. / AFRIKAANSE OPSOMMING: In hierdie tesis beskou ons die noodsaaklike onderwerpe wat in die algemeen verband
hou met stereovisie en die ooreenstemmingsprobleem. Die mikpunt is om 'n digte
3D toneel te rekonstrueer vanaf beelde wat deur twee ruimtelik-verwante kameras
vasgelê is. Ons hoofdoel is egter spoed, en intydse implementering op 'n standaard
rekenaar. Ons wil die SVE (CPU) gebruik om die ooreenstemmingsprobleem op
te los, en reserveer die GVE (GPU) vir model-beraping. Ons bespreek drie fundamentele
tipes algoritmes en evalueer hul geskiktheid vir hierdie doel. Ons kies
uiteindelik om 'n hiërargiese weergawe van die dinamiese programmeringsalgoritme
te implementeer, as gevolg van die goeie balans tussen akkuraatheid en spoed. Soos
wat ons ons stelsel van die grond af opbou, stel ons geleidelik nodige konsepte voor
en vestig meetkundige beginsels, algemeen tot meeste stereovisie stelsels, en bespreek
dit soos dit toepaslik word. Dit word duidelik dat skandeerlyn-strydigheid die
grootste swakheid van die hiërargiese dinamiese programmeringsalgoritme is. Ons
vind dat die een-dimensionele LULU- lter goedkoop is in terme van berekeninge,
en e ektief aangewend kan word om uitskieters te verwyder as dit dwarsoor skandeerlyne
toegepas word. Ons buit die hiërargiese struktuur van ons algoritme uit en
kombineer dit met sub-piksel verfyning om resultate te produseer teen video tempo
(ongeveer 20 raampies per sekonde). 'n 3D model word ook gekonstrueer teen video
tempo in 'n stelsel wat aanlyn loop, met slegs 'n klein vertraging tussen die verkryging
van die intree-beelde en die beraping van die model. Die kwaliteit van ons
resultate is nie net hoogs mededingend met dié van die heel beste algoritmes nie,
maar die verkrygbare spoed is ook beduidend vinniger.
|
3 |
Sensitivity Analysis of Virtual Terrain Accuracy for Vision Based AlgorithmsMarc, Róbert January 2012 (has links)
A number of three-dimensional virtual environments are available to develop vision-based robotic capabilities. These have the advantage of repeated trials at low cost compared to field testing. However, they still suffer from a lack of realism and credibility for validation and verification.This work consists of the creation and validation of state of the art virtual terrains for research in Martian rover vision-based navigation algorithms. This Master's thesis focuses on the creation of virtual environments, which are the exact imitations of the planetary terrain testbed at the European Space Agency's ESTEC site. Two different techniques are used to recreate the Martian-like site in a simulator. The first method uses a novel multi-view stereo reconstruction technique. The second method uses a high precision laser scanning system to accurately map the terrain.Comparison of real environment to the virtual environments is done at exact same locations by making use of captured stereo camera images. Ultimately, the differences will be characterized by the main known feature detectors (e.g. SURF, and SIFT).The present work led to the creation and validation of a database containing highly realistic virtual terrains which can be found on Mars for the purpose of vision-based control algorithms verification. / <p>Validerat; 20120821 (anonymous)</p>
|
4 |
Performance Evaluation of Stereo Reconstruction Algorithms on NIR Images / Utvärdering av algoritmer för stereorekonstruktion av NIR-bilderVidas, Dario January 2016 (has links)
Stereo vision is one of the most active research areas in computer vision. While hundreds of stereo reconstruction algorithms have been developed, little work has been done on the evaluation of such algorithms and almost none on evaluation on Near-Infrared (NIR) images. Of almost a hundred examined, we selected a set of 15 stereo algorithms, mostly with real-time performance, which were then categorized and evaluated on several NIR image datasets, including single stereo pair and stream datasets. The accuracy and run time of each algorithm are measured and compared, giving an insight into which categories of algorithms perform best on NIR images and which algorithms may be candidates for real-time applications. Our comparison indicates that adaptive support-weight and belief propagation algorithms have the highest accuracy of all fast methods, but also longer run times (2-3 seconds). On the other hand, faster algorithms (that achieve 30 or more fps on a single thread) usually perform an order of magnitude worse when measuring the per-centage of incorrectly computed pixels.
|
5 |
Fusing Stereo Measurements into a Global 3D RepresentationBlåwiik, Per January 2021 (has links)
The report describes the thesis project with the aim of fusing an arbitrary sequence of stereo measurements into a global 3D representation in real-time. The proposed method involves an octree-based signed distance function for representing the 3D environment, where the geomtric data is fused together using a cumulative weighted update function, and finally rendered by incremental mesh extraction using the marching cubes algorithm. The result of the project was a prototype system, integrated into a real-time stereo reconstruction system, which was evaluated by benchmark tests as well as qualitative comparisons with an older method of overlapping meshes. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
|
6 |
Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual WalkthroughsBrunton, Alan P 28 November 2012 (has links)
This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data.
|
7 |
Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual WalkthroughsBrunton, Alan P 28 November 2012 (has links)
This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data.
|
8 |
Strukturelle Ansätze für die Stereorekonstruktion / Stuctural approaches for stereo-reconstructionShlezinger, Dmytro 15 August 2005 (has links) (PDF)
Die Dissertation beschäftigt sich mit Labeling Problemen. Dieses Forschungsgebiet bildet einen wichtigen Teil der strukturellen Mustererkennung, in der die Struktur des zu erkennenden Objektes explizit berücksichtigt wird. Die entwickelte Theorie wird auf die Aufgabe der Stereorekonstruktion angewendet. / The thesis studies the class of labeling problems. This theory contributes to the new stream in pattern recognition in which structure is explicitly taken into account. The developed theory is applied to practical problem of stereo reconstruction.
|
9 |
Methods for image-based 3-D modeling using color and depth camerasYlimäki, M. (Markus) 05 December 2017 (has links)
Abstract
This work addresses the problems related to three-dimensional modeling of scenes and objects and model evaluation. The work is divided into four main parts. At first, the work concentrates on purely image-based reconstruction while the second part presents a modeling pipeline based on an active depth sensor. Then, the work introduces methods for producing surface meshes from point clouds, and finally, a novel approach for model evaluation is presented.
In the first part, this work proposes a multi-view stereo (MVS) reconstruction method that takes a set of images as an input and outputs a model represented as a point cloud. The method is based on match propagation, where a set of initial corresponding points between images is expanded iteratively into larger regions by searching new correspondences in the spatial neighborhood of the existing ones. The expansion is implemented using a best-first strategy, where the most reliable match is always expanded first. The method produces comparable results with the state-of-the-art but significantly faster.
In the second part, this work presents a method that merges a sequence of depth maps into a single non-redundant point cloud. In the areas, where the depth maps overlap, the method fuses points together by giving more weight to points which seem to be more reliable. The method overcomes its predecessor both in accuracy and robustness. In addition, this part introduces a method for depth camera calibration. The method develops on an existing calibration approach which was originally designed for the first generation Microsoft Kinect device.
The third part of the thesis addresses the problem of converting the point clouds to surface meshes. The work briefly reviews two well-known approaches and compares their ability to produce sparse mesh models without sacrificing accuracy.
Finally, the fourth part of this work describes the development of a novel approach for performance evaluation of reconstruction algorithms. In addition to the accuracy and completeness, which are the metrics commonly used in existing evaluation benchmarks, the method also takes the compactness of the models into account. The metric enables the evaluation of the accuracy-compactness trade-off of the models. / Tiivistelmä
Tässä työssä käsitellään näkymän tai esineen kolmiulotteista mallintamista ja tulosten laadun arviointia. Työ on jaettu neljään osaan. Ensiksi keskitytään pelkästään valokuvia hyödyntävään mallinnukseen ja sitten esitellään menetelmä syvyyskamerapohjaiseen mallinnukseen. Kolmas osa kuvaa menetelmiä verkkomallien luomiseen pistepilvestä ja lopuksi esitellään menetelmä mallien laadun arviointiin.
Ensimmäisessä osassa esitellään usean kuvan stereoon perustuva mallinnusmenetelmä, joka saa syötteenä joukon valokuvia ja tuottaa kuvissa näkyvästä kohteesta pistepilvimallin. Menetelmä perustuu vastinpisteiden laajennukseen, jossa kuvien välisiä pistevastaavuuksia laajennetaan iteratiivisesti suuremmiksi vastinalueiksi hakemalla uusia vastinpistepareja jo löydettyjen läheisyydestä. Laajennus käyttää paras ensin -menetelmää, jossa luotettavin pistevastaavuus laajennetaan aina ensin. Menetelmä tuottaa vertailukelpoisia tuloksia johtaviin menetelmiin verrattuna, mutta merkittävästi nopeammin.
Toisessa osassa esitellään menetelmä, joka yhdistää joukon syvyyskameralla kaapattuja syvyyskarttoja yhdeksi pistepilveksi. Alueilla, jotka sisältävät syvyysmittauksia useasta syvyyskartasta, päällekkäiset mittaukset yhdistetään painottamalla luotettavammalta vaikuttavaa mittausta. Menetelmä on tarkempi kuin edeltäjänsä ja toimii paremmin kohinaisemmalla datalla. Lisäksi tässä osassa esitellään menetelmä syvyyskameran kalibrointiin. Menetelmä kehittää jo olemassa olevaa kalibrointityökalua, joka alun perin kehitettiin ensimmäisen sukupolven Microsoft Kinect laitteelle.
Väitöskirjan kolmas osa käsittelee pintamallin luomista pistepilvestä. Työ esittelee kaksi hyvin tunnettua menetelmää ja vertailee niiden kykyä luoda harvoja, mutta edelleen tarkkoja malleja.
Lopuksi esitellään uudenlainen menetelmä mallinnusmenetelmien arviointiin. Tarkkuuden ja kattavuuden lisäksi, jotka ovat yleisimmät arvioinnissa käytetyt metriikat, menetelmä ottaa huomioon myös mallin pistetiheyden. Metriikan avulla on mahdollista arvioida kompromissia mallin tarkkuuden ja tiheyden välillä.
|
10 |
Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual WalkthroughsBrunton, Alan P January 2012 (has links)
This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data.
|
Page generated in 0.1976 seconds