1 |
Wide baseline matching with applications to visual servoingTell, Dennis January 2002 (has links)
No description available.
|
2 |
Wide baseline matching with applications to visual servoingTell, Dennis January 2002 (has links)
No description available.
|
3 |
Reconstruction 3D de l'environnement dynamique d'un véhicule à l'aide d'un système multi-caméras hétérogène en stéréo wide-baseline / 3D reconstruction of the dynamic environment surrounding a vehicle using a heterogeneous multi-camera system in wide-baseline stereoMennillo, Laurent 05 June 2019 (has links)
Cette thèse a été réalisée dans le secteur de l'industrie automobile, en collaboration avec le Groupe Renault et concerne en particulier le développement de systèmes d'aide à la conduite avancés et de véhicules autonomes. Les progrès réalisés par la communauté scientifique durant les dernières décennies, dans les domaines de l'informatique et de la robotique notamment, ont été si importants qu'ils permettent aujourd'hui la mise en application de systèmes complexes au sein des véhicules. Ces systèmes visent dans un premier temps à réduire les risques inhérents à la conduite en assistant les conducteurs, puis dans un second temps à offrir des moyens de transport entièrement autonomes. Les méthodes de SLAM multi-objets actuellement intégrées au sein de ces véhicules reposent pour majeure partie sur l'utilisation de capteurs embarqués très performants tels que des télémètres laser, au coût relativement élevé. Les caméras numériques en revanche, de par leur coût largement inférieur, commencent à se démocratiser sur certains véhicules de grande série et assurent généralement des fonctions d'assistance à la conduite, pour l'aide au parking ou le freinage d'urgence, par exemple. En outre, cette implantation plus courante permet également d'envisager leur utilisation afin de reconstruire l'environnement dynamique proche des véhicules en trois dimensions. D'un point de vue scientifique, les techniques de SLAM visuel multi-objets existantes peuvent être regroupées en deux catégories de méthodes. La première catégorie et plus ancienne historiquement concerne les méthodes stéréo, faisant usage de plusieurs caméras à champs recouvrants afin de reconstruire la scène dynamique observée. La plupart reposent en général sur l'utilisation de paires stéréo identiques et placées à faible distance l'une de l'autre, ce qui permet un appariement dense des points d'intérêt dans les images et l'estimation de cartes de disparités utilisées lors de la segmentation du mouvement des points reconstruits. L'autre catégorie de méthodes, dites monoculaires, ne font usage que d'une unique caméra lors du processus de reconstruction. Cela implique la compensation du mouvement propre du système d'acquisition lors de l'estimation du mouvement des autres objets mobiles de la scène de manière indépendante. Plus difficiles, ces méthodes posent plusieurs problèmes, notamment le partitionnement de l'espace de départ en plusieurs sous-espaces représentant les mouvements individuels de chaque objet mobile, mais aussi le problème d'estimation de l'échelle relative de reconstruction de ces objets lors de leur agrégation au sein de la scène statique. La problématique industrielle de cette thèse, consistant en la réutilisation des systèmes multi-caméras déjà implantés au sein des véhicules, majoritairement composés d'un caméra frontale et de caméras surround équipées d'objectifs très grand angle, a donné lieu au développement d'une méthode de reconstruction multi-objets adaptée aux systèmes multi-caméras hétérogènes en stéréo wide-baseline. Cette méthode est incrémentale et permet la reconstruction de points mobiles éparses, grâce notamment à plusieurs contraintes géométriques de segmentation des points reconstruits ainsi que de leur trajectoire. Enfin, une évaluation quantitative et qualitative des performances de la méthode a été menée sur deux jeux de données distincts, dont un a été développé durant ces travaux afin de présenter des caractéristiques similaires aux systèmes hétérogènes existants. / This Ph.D. thesis, which has been carried out in the automotive industry in association with Renault Group, mainly focuses on the development of advanced driver-assistance systems and autonomous vehicles. The progress made by the scientific community during the last decades in the fields of computer science and robotics has been so important that it now enables the implementation of complex embedded systems in vehicles. These systems, primarily designed to provide assistance in simple driving scenarios and emergencies, now aim to offer fully autonomous transport. Multibody SLAM methods currently used in autonomous vehicles often rely on high-performance and expensive onboard sensors such as LIDAR systems. On the other hand, digital video cameras are much cheaper, which has led to their increased use in newer vehicles to provide driving assistance functions, such as parking assistance or emergency braking. Furthermore, this relatively common implementation now allows to consider their use in order to reconstruct the dynamic environment surrounding a vehicle in three dimensions. From a scientific point of view, existing multibody visual SLAM techniques can be divided into two categories of methods. The first and oldest category concerns stereo methods, which use several cameras with overlapping fields of view in order to reconstruct the observed dynamic scene. Most of these methods use identical stereo pairs in short baseline, which allows for the dense matching of feature points to estimate disparity maps that are then used to compute the motions of the scene. The other category concerns monocular methods, which only use one camera during the reconstruction process, meaning that they have to compensate for the ego-motion of the acquisition system in order to estimate the motion of other objects. These methods are more difficult in that they have to address several additional problems, such as motion segmentation, which consists in clustering the initial data into separate subspaces representing the individual movement of each object, but also the problem of the relative scale estimation of these objects before their aggregation within the static scene. The industrial motive for this work lies in the use of existing multi-camera systems already present in actual vehicles to perform dynamic scene reconstruction. These systems, being mostly composed of a front camera accompanied by several surround fisheye cameras in wide-baseline stereo, has led to the development of a multibody reconstruction method dedicated to such heterogeneous systems. The proposed method is incremental and allows for the reconstruction of sparse mobile points as well as their trajectory using several geometric constraints. Finally, a quantitative and qualitative evaluation conducted on two separate datasets, one of which was developed during this thesis in order to present characteristics similar to existing multi-camera systems, is provided.
|
4 |
Wide Baseline Stereo Image Rectification and MatchingHao, Wei 01 December 2011 (has links)
Perception of depth information is central to three-dimensional (3D) vision problems. Stereopsis is an important passive vision technique for depth perception. Wide baseline stereo is a challenging problem that attracts much interest recently from both the theoretical and application perspectives. In this research we approach the problem of wide baseline stereo using the geometric and structural constraints within feature sets.
The major contribution of this dissertation is that we proposed and implemented a more efficient paradigm to handle the challenges introduced by perspective distortion in wide baseline stereo, compared to the state-of-the-art. To facilitate the paradigm, a new feature-matching algorithm that extends the state-of-the-art matching methods to larger baseline cases is proposed. The proposed matching algorithm takes advantage of both the local feature descriptor and the structure pattern of the feature set, and enhances the matching results in the case of large viewpoint change.
In addition, an innovative rectification for uncalibrated images is proposed to make wide baseline stereo dense matching possible. We noticed that present rectification methods did not take into account the need for shape adjustment. By introducing the geometric constraints of the pattern of the feature points, we propose a rectification method that maximizes the structure congruency based on Delaunay triangulation nets and thus avoid some existing problems of other methods.
The rectified stereo images can then be used to generate a dense depth map of the scene. The task is much simplified compared to some existing method because the 2D searching problem is reduced to 1D searching.
To validate the proposed methods, real world images are applied to test the performance and comparisons to the state-of-the-art methods are provided. The performance of the dense matching with respect to the changing baseline is also studied.
|
5 |
Depth Estimation from Structured Light FieldsLi, Yan 03 July 2020 (has links) (PDF)
Light fields have been populated as a new geometry representation of 3D scenes, which is composed of multiple views, offering large potentials to improve the depth perception in the scenes. The light fields can be captured by different camera sensors, in which different acquisitions give rise to different representations, mainly containing a line of camera views - 3D light field representation, a grid of camera views - 4D light field representation. When the captured position is uniformly distributed, the outputs are the structured light fields. This thesis focuses on depth estimation from the structured light fields. The light field representations (or setups) differ not only in terms of 3D and 4D, but also the density or baseline of camera views. Rather than the objective of reconstructing high quality depths from dense (narrow-baseline) light fields, we put efforts into a general objective, i.e. reconstructing depths from a wide range of light field setups. Hence a series of depth estimation methods from light fields, including traditional and deep learningbased methods, are presented in this thesis. Extra efforts are made for achieving the high performance on aspects of depth accuracy and computation efficiency. Specifically, 1) a robust traditional framework is put forward for estimating the depth in sparse (wide-baseline) light fields, where a combination of the cost calculation, the window-based filtering and the optimization are conducted; 2) the above-mentioned framework is extended with the extra new or alternative components to the 4D light fields. This new framework shows the ability of being independent of the number of views and/or baseline of 4D light fields when predicting the depth; 3) two new deep learning-based methods are proposed for the light fields with the narrow-baseline, where the features are learned from the Epipolar-Plane-Image and light field images. One of the methods is designed as a lightweight model for more practical goals; 4) due to the dataset deficiency, a large-scale and diverse synthetic wide-baseline dataset with labeled data are created. A new lightweight deep model is proposed for the 4D light fields with the wide-baseline. Besides, this model also works on the 4D light fields with the narrow baseline if trained on the narrow-baseline datasets. Evaluations are made on the public light field datasets. Experimental results show the proposed depth estimation methods from a wide range of light field setups are capable of achieving the high quality depths, and some even outperform state-of-the-art methods. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
|
6 |
Image Based View SynthesisXiao, Jiangjian 01 January 2004 (has links)
This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dynamic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in object recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, and the seed regions are expanded and outliers are rejected employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, the occlusion order constraints on multiple frames are explored, which guarantee that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Recovering the geometrical transformations among images of a scene is a prerequisite step for image-based view synthesis. I have developed a wide baseline matching algorithm to identify the correspondences between two un-calibrated images, and to further determine the geometric relationship between images, such as epipolar geometry or projective transformation. In our approach, a set of salient features, edge-corners, are detected to provide robust and consistent matching primitives. Then, based on the Singular Value Decomposition (SVD) of an affine matrix, we effectively quantize the search space into two independent subspaces for rotation angle and scaling factor, and then we use a two-stage affine matching algorithm to obtain robust matches between these two frames. The experimental results on a number of wide baseline images strongly demonstrate that our matching method outperforms the state-of-art algorithms even under the significant camera motion, illumination variation, occlusion, and self-similarity. Given the wide baseline matches among images I have developed a novel method for Dynamic view morphing. Dynamic view morphing deals with the scenes containing moving objects in presence of camera motion. The objects can be rigid or non-rigid, each of them can move in any orientation or direction. The proposed method can generate a series of continuous and physically accurate intermediate views from only two reference images without any knowledge about 3D. The procedure consists of three steps: segmentation, morphing and post-warping. Given a boundary connection constraint, the source and target scenes are segmented into several layers for morphing. Based on the decomposition of affine transformation between corresponding points, we uniquely determine a physically correct path for post-warping by the least distortion method. I have successfully generalized the dynamic scene synthesis problem from the simple scene with only rotation to the dynamic scene containing non-rigid objects. My method can handle dynamic rigid or non-rigid objects, including complicated objects such as humans. Finally, I have also developed a novel algorithm for tri-view morphing. This is an efficient image-based method to navigate a scene based on only three wide-baseline un-calibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images using our wide baseline matching method, an accurate trifocal plane is extracted from the trifocal tensor implied in these three images. Next, employing a trinocular-stereo algorithm and barycentric blending technique, we generate an arbitrary novel view to navigate the scene in a 2D space. Furthermore, after self-calibration of the cameras, a 3D model can also be correctly augmented into this virtual environment synthesized by the tri-view morphing algorithm. We have applied our view morphing framework to several interesting applications: 4D video synthesis, automatic target recognition, multi-view morphing.
|
7 |
Line Matching in a Wide-Baseline StereoviewAl-Shahri, Mohammed January 2013 (has links)
No description available.
|
8 |
Two View Line-Based Matching, Motion Estimation and Reconstruction for Central Imaging Systems / Mise en correspondance de lignes à partir de deux vues, estimation du mouvement et reconstruction pour les systèmes centrauxMosaddegh, Saleh 17 October 2011 (has links)
L'objectif principal de cette thèse est de développer des algorithmes génériques d'estimation du mouvement et de la structure à partir d'images de scènes prises par différents types de systèmes d'acquisition centrale : caméra perspective,fish-eye et systèmes catadioptriques, notamment. En supposant que la correspondance entre les pixels de l'image et les lignes de vue dans l'espace est connue, nous travaillons sur des images sphériques, plutôt que sur des images planes (projection des images sur la sphère unitaire), ce qui nous permet de considérer des points sur une vue mieux adaptée aux images omnidirectionnelles et d'utiliser un modèle générique valable pour tous les capteurs centraux. Dans la première partie de cette thèse, nous développons une approche générique de mise en correspondance simple de lignes à partir d'images de scènes urbaines ou péri-urbaines sous la contrainte d'un faible déplacement du capteur,ainsi qu'une contrainte rapide et originale pour apparier des lignes d'un environnement plan par morceaux, indépendante du mouvement de la caméra centrale. Ensuite, nous introduisons une méthode unique et effcace pour estimer le recouvrement entre deux segments sur des images perspectives, diminuant considérablement le temps global de calcul par rapport aux algorithmes connus.Enfin, dans la dernière partie de cette thèse, nous développons un algorithme d'estimation du mouvement et de reconstruction de surfaces pour les scènes planes par morceaux applicable à toutes sortes d'images centrales, à partir de deux vues uniquement et ne nécessitant qu'un nombre minime de correspondances de ligne. Pour démontrer les performances de ces algorithmes, nous les avons expérimentés avec diverses images réelles acquises à partir d'une caméra perspective,une lentille fish-eye, et deux différents types de capteurs paracatadioptriques(l'un est composé d'un miroir simple, et l'autre d'un miroir double). / The primary goal of this thesis is to develop generic motion and structure algorithms for images taken from constructed scenes by various types of central imaging systems including perspective, fish-eye and catadioptric systems. As-suming that the mapping between the image pixels and their 3D rays in space is known, instead of image planes, we work on image spheres (projection of the images on a unit sphere) which enable us to present points over the entire viewsphere suitable for presenting omnidirectional images. In the first part of this thesis, we develop a generic and simple line matching approach for images taken from constructed scenes under a short baseline motion as well as a fast and original geometric constraint for matching lines in planar constructed scenes insensible to the motion of the camera for all types of centralimages including omnidirectional images.Next, we introduce a unique and efficient way of computing overlap between two segments on perspective images which considerably decreases the over all computational time of a segment-based motion estimation and reconstruction algorithm. Finally in last part of this thesis, we develop a simple motion estima-tion and surface reconstruction algorithm for piecewise planar scenes applicable to all kinds of central images which uses only two images and is based on mini-mum line correspondences.To demonstrate the performance of these algorithms we experiment withvarious real images taken by a simple perspective camera, a fish-eye lens, and two different kinds of paracatadioptric sensors, the first one is a folded catadioptric camera and the second one is a classic paracatadioptric system composed of a parabolic mirror in front of a telecentric lens.
|
9 |
Two View Line-Based Matching, Motion Estimation and Reconstruction for Central Imaging SystemsMosaddegh, Saleh 17 October 2011 (has links) (PDF)
The primary goal of this thesis is to develop generic motion and structure algorithms for images taken from constructed scenes by various types of central imaging systems including perspective, fish-eye and catadioptric systems. As-suming that the mapping between the image pixels and their 3D rays in space is known, instead of image planes, we work on image spheres (projection of the images on a unit sphere) which enable us to present points over the entire viewsphere suitable for presenting omnidirectional images. In the first part of this thesis, we develop a generic and simple line matching approach for images taken from constructed scenes under a short baseline motion as well as a fast and original geometric constraint for matching lines in planar constructed scenes insensible to the motion of the camera for all types of centralimages including omnidirectional images.Next, we introduce a unique and efficient way of computing overlap between two segments on perspective images which considerably decreases the over all computational time of a segment-based motion estimation and reconstruction algorithm. Finally in last part of this thesis, we develop a simple motion estima-tion and surface reconstruction algorithm for piecewise planar scenes applicable to all kinds of central images which uses only two images and is based on mini-mum line correspondences.To demonstrate the performance of these algorithms we experiment withvarious real images taken by a simple perspective camera, a fish-eye lens, and two different kinds of paracatadioptric sensors, the first one is a folded catadioptric camera and the second one is a classic paracatadioptric system composed of a parabolic mirror in front of a telecentric lens.
|
Page generated in 0.0705 seconds