Global ETD Search

91	3D Vision Geometry for Rolling Shutter Cameras / Géométrie pour la vision 3D avec des caméras Rolling Shutter Lao, Yizhen 16 May 2019 (has links) De nombreuses caméras CMOS modernes sont équipées de capteurs Rolling Shutter (RS). Ces caméras à bas coût et basse consommation permettent d’atteindre de très hautes fréquences d’acquisition. Dans ce mode d’acquisition, les lignes de pixels sont exposées séquentiellement du haut vers le bas de l'image. Par conséquent, les images capturées alors que la caméra et/ou la scène est en mouvement présentent des distorsions qui rendent les algorithmes classiques au mieux moins précis, au pire inutilisables en raison de singularités ou de configurations dégénérées. Le but de cette thèse est de revisiter la géométrie de la vision 3D avec des caméras RS en proposant des solutions pour chaque sous-tâche du pipe-line de Structure-from-Motion (SfM).Le chapitre II présente une nouvelle méthode de correction du RS en utilisant les droites. Contrairement aux méthodes existantes, qui sont itératives et font l’hypothèse dite Manhattan World (MW), notre solution est linéaire et n’impose aucune contrainte sur l’orientation des droites 3D. De plus, la méthode est intégrée dans un processus de type RANSAC permettant de distinguer les courbes qui sont des projections de segments droits de celles qui correspondent à de vraies courbes 3D. La méthode de correction est ainsi plus robuste et entièrement automatisée.Le chapitre III revient sur l'ajustement faisceaux ou bundle adjustment (BA). Nous proposons un nouvel algorithme basé sur une erreur de projection dans laquelle l’index de ligne des points projetés varie pendant l’optimisation afin de garder une cohérence géométrique contrairement aux méthodes existantes qui considère un index fixe (celui mesurés dans l’image). Nous montrons que cela permet de lever la dégénérescence dans le cas où les directions de scan des images sont trop proches (cas très communs avec des caméras embraquées sur un véhicule par exemple). Dans le chapitre VI nous étendons le concept d'homographie aux cas d’images RS en démontrant que la relation point-à-point entre deux images d’un nuage de points coplanaires pouvait s’exprimer sous la forme de 3 à 7 matrices de taille 3X3 en fonction du modèle de mouvement utilisé. Nous proposons une méthode linéaire pour le calcul de ces matrices. Ces dernières sont ensuite utilisées pour résoudre deux problèmes classiques en vision par ordinateur à savoir le calcul du mouvement relatif et le « mosaïcing » dans le cas RS.Dans le chapitre V nous traitons le problème de calcul de pose et de reconstruction multi-vues en établissant une analogie avec les méthodes utilisées pour les surfaces déformables telles que SfT (Structure-from-Template) et NRSfM (Non Rigid Structure-from-Motion). Nous montrons qu’une image RS d’une scène rigide en mouvement peut être interprétée comme une image Global Shutter (GS) d’une surface virtuellement déformée (par l’effet RS). La solution proposée pour estimer la pose et la structure 3D de la scène est ainsi composée de deux étapes. D’abord les déformations virtuelles sont d’abord calculées grâce à SfT ou NRSfM en assumant un modèle GS classique (relaxation du modèle RS). Ensuite, ces déformations sont réinterprétées comme étant le résultat du mouvement durant l’acquisition (réintroduction du modèle RS). L’approche proposée présente ainsi de meilleures propriétés de convergence que les approches existantes. / Many modern CMOS cameras are equipped with Rolling Shutter (RS) sensors which are considered as low cost, low consumption and fast cameras. In this acquisition mode, the pixel rows are exposed sequentially from the top to the bottom of the image. Therefore, images captured by moving RS cameras produce distortions (e.g. wobble and skew) which make the classic algorithms at best less precise, at worst unusable due to singularities or degeneracies. The goal of this thesis is to propose a general framework for modelling and solving structure from motion (SfM) with RS cameras. Our approach consists in addressing each sub-task of the SfM pipe-line (namely image correction, absolute and relative pose estimation and bundle adjustment) and proposing improvements.The first part of this manuscript presents a novel RS correction method which uses line features. Unlike existing methods, which uses iterative solutions and make Manhattan World (MW) assumption, our method R4C computes linearly the camera instantaneous-motion using few image features. Besides, the method was integrated into a RANSAC-like framework which enables us to detect curves that correspond to actual 3D straight lines and reject outlier curves making image correction more robust and fully automated.The second part revisits Bundle Adjustment (BA) for RS images. It deals with a limitation of existing RS bundle adjustment methods in case of close read-out directions among RS views which is a common configuration in many real-life applications. In contrast, we propose a novel camera-based RS projection algorithm and incorporate it into RSBA to calculate reprojection errors. We found out that this new algorithm makes SfM survive the degenerate configuration mentioned above.The third part proposes a new RS Homography matrix based on point correspondences from an RS pair. Linear solvers for the computation of this matrix are also presented. Specifically, a practical solver with 13 point correspondences is proposed. In addition, we present two essential applications in computer vision that use RS homography: plane-based RS relative pose estimation and RS image stitching. The last part of this thesis studies absolute camera pose problem (PnP) and SfM which handle RS effects by drawing analogies with non-rigid vision, namely Shape-from-Template (SfT) and Non-rigid SfM (NRSfM) respectively. Unlike all existing methods which perform 3D-2D registration after augmenting the Global Shutter (GS) projection model with the velocity parameters under various kinematic models, we propose to use local differential constraints. The proposed methods outperform stat-of-the-art and handles configurations that are critical for existing methods. Rolling shutter Pose absolue et relative Homographie S-f-M Ajustement de faisceaux Rolling shutter Image correction Pose estimation Relative pose estimation Homography Structure from Motion Bundle Adjustment
92	Structure-from-motion For Systems With Perspective And Omnidirectional Cameras Bastanlar, Yalin 01 July 2009 (has links) (PDF) In this thesis, a pipeline for structure-from-motion with mixed camera types is described and methods for the steps of this pipeline to make it effective and automatic are proposed. These steps can be summarized as calibration, feature point matching, epipolar geometry and pose estimation, triangulation and bundle adjustment. We worked with catadioptric omnidirectional and perspective cameras and employed the sphere camera model, which encompasses single-viewpoint catadioptric systems as well as perspective cameras. For calibration of the sphere camera model, a new technique that has the advantage of linear and automatic parameter initialization is proposed. The projection of 3D points on a catadioptric image is represented linearly with a 6x10 projection matrix using lifted coordinates. This projection matrix is computed with an adequate number of 3D-2D correspondences and decomposed to obtain intrinsic and extrinsic parameters. Then, a non-linear optimization is performed to refine the parameters. For feature point matching between hybrid camera images, scale invariant feature transform (SIFT) is employed and a method is proposed to improve the SIFT matching output. With the proposed approach, omnidirectional-perspective matching performance significantly increases to enable automatic point matching. In addition, the use of virtual camera plane (VCP) images is evaluated, which are perspective images produced by unwarping the corresponding region in the omnidirectional image. The hybrid epipolar geometry is estimated using random sample consensus (RANSAC) and alternatives of pose estimation methods are evaluated. A weighting strategy for iterative linear triangulation which improves the structure estimation accuracy is proposed. Finally, multi-view structure-from-motion (SfM) is performed by employing the approach of adding views to the structure one by one. To refine the structure estimated with multiple views, sparse bundle adjustment method is employed with a modification to use the sphere camera model. Experiments on simulated and real images for the proposed approaches are conducted. Also, the results of hybrid multi-view SfM with real images are demonstrated, emphasizing the cases where it is advantageous to use omnidirectional cameras with perspective cameras.
93	Modélisation 3D automatique d'environnements : une approche éparse à partir d'images prises par une caméra catadioptrique Yu, Shuda 03 June 2013 (has links) (PDF) La modélisation 3d automatique d'un environnement à partir d'images est un sujet toujours d'actualité en vision par ordinateur. Ce problème se résout en général en trois temps : déplacer une caméra dans la scène pour prendre la séquence d'images, reconstruire la géométrie, et utiliser une méthode de stéréo dense pour obtenir une surface de la scène. La seconde étape met en correspondances des points d'intérêts dans les images puis estime simultanément les poses de la caméra et un nuage épars de points 3d de la scène correspondant aux points d'intérêts. La troisième étape utilise l'information sur l'ensemble des pixels pour reconstruire une surface de la scène, par exemple en estimant un nuage de points dense.Ici nous proposons de traiter le problème en calculant directement une surface à partir du nuage épars de points et de son information de visibilité fournis par l'estimation de la géométrie. Les avantages sont des faibles complexités en temps et en espace, ce qui est utile par exemple pour obtenir des modèles compacts de grands environnements comme une ville. Pour cela, nous présentons une méthode de reconstruction de surface du type sculpture dans une triangulation de Delaunay 3d des points reconstruits. L'information de visibilité est utilisée pour classer les tétraèdres en espace vide ou matière. Puis une surface est extraite de sorte à séparer au mieux ces tétraèdres à l'aide d'une méthode gloutonne et d'une minorité de points de Steiner. On impose sur la surface la contrainte de 2-variété pour permettre des traitements ultérieurs classiques tels que lissage, raffinement par optimisation de photo-consistance ... Cette méthode a ensuite été étendue au cas incrémental : à chaque nouvelle image clef sélectionnée dans une vidéo, de nouveaux points 3d et une nouvelle pose sont estimés, puis la surface est mise à jour. La complexité en temps est étudiée dans les deux cas (incrémental ou non). Dans les expériences, nous utilisons une caméra catadioptrique bas coût et obtenons des modèles 3d texturés pour des environnements complets incluant bâtiments, sol, végétation ... Un inconvénient de nos méthodes est que la reconstruction des éléments fins de la scène n'est pas correcte, par exemple les branches des arbres et les pylônes électriques. [SPI:OTHER] Engineering Sciences/Other Reconstruction de 2-variété Triangulation de Delaunay 3d Sommets de Steiner Analyse de complexité Nuage de points épars Structure-from-Motion
94	Widening the basin of convergence for the bundle adjustment type of problems in computer vision Hong, Je Hyeong January 2018 (has links) Bundle adjustment is the process of simultaneously optimizing camera poses and 3D structure given image point tracks. In structure-from-motion, it is typically used as the final refinement step due to the nonlinearity of the problem, meaning that it requires sufficiently good initialization. Contrary to this belief, recent literature showed that useful solutions can be obtained even from arbitrary initialization for fixed-rank matrix factorization problems, including bundle adjustment with affine cameras. This property of wide convergence basin of high quality optima is desirable for any nonlinear optimization algorithm since obtaining good initial values can often be non-trivial. The aim of this thesis is to find the key factor behind the success of these recent matrix factorization algorithms and explore the potential applicability of the findings to bundle adjustment, which is closely related to matrix factorization. The thesis begins by unifying a handful of matrix factorization algorithms and comparing similarities and differences between them. The theoretical analysis shows that the set of successful algorithms actually stems from the same root of the optimization method called variable projection (VarPro). The investigation then extends to address why VarPro outperforms the joint optimization technique, which is widely used in computer vision. This algorithmic comparison of these methods yields a larger unification, leading to a conclusion that VarPro benefits from an unequal trust region assumption between two matrix factors. The thesis then explores ways to incorporate VarPro to bundle adjustment problems using projective and perspective cameras. Unfortunately, the added nonlinearity causes a substantial decrease in the convergence basin of VarPro, and therefore a bootstrapping strategy is proposed to bypass this issue. Experimental results show that it is possible to yield feasible metric reconstructions and pose estimations from arbitrary initialization given relatively clean point tracks, taking one step towards initialization-free structure-from-motion.
95	Méthodes de reconstruction tridimensionnelle intégrant des points cycliques : application au suivi d’une caméra / Structure-from-Motion paradigms integrating circular points : application to camera tracking Calvet, Lilian 23 January 2014 (has links) Cette thèse traite de la reconstruction tridimensionnelle d’une scène rigide à partir d’une collection de photographies numériques, dites vues. Le problème traité est connu sous le nom du "calcul de la structure et du mouvement" (structure-and/from-motion) qui consiste à "expliquer" des trajectoires de points dits d’intérêt au sein de la collection de vues par un certain mouvement de l’appareil (dont sa trajectoire) et des caractéristiques géométriques tridimensionnelles de la scène. Dans ce travail, nous proposons les fondements théoriques pour étendre certaines méthodes de calcul de la structure et du mouvement afin d’intégrer comme données d’entrée, des points d’intérêt réels et des points d’intérêt complexes, et plus précisément des images de points cycliques. Pour tout plan projectif, les points cycliques forment une paire de points complexes conjugués qui, par leur invariance par les similitudes planes, munissent le plan projectif d’une structure euclidienne. Nous introduisons la notion de marqueurs cycliques qui sont des marqueurs plans permettant de calculer sans ambiguïté les images des points cycliques de leur plan de support dans toute vue. Une propriété de ces marqueurs, en plus d’être très "riches" en information euclidienne, est que leurs images peuvent être appariées même si les marqueurs sont disposés arbitrairement sur des plans parallèles, grâce à l’invariance des points cycliques. Nous montrons comment utiliser cette propriété dans le calcul projectif de la structure et du mouvement via une technique matricielle de réduction de rang, dite de factorisation, de la matrice des données correspondant aux images de points réels, complexes et/ou cycliques. Un sous-problème critique abordé dans le calcul de la structure et du mouvement est celui de l’auto-calibrage de l’appareil, problème consistant à transformer un calcul projectif en un calcul euclidien. Nous expliquons comment utiliser l’information euclidienne fournie par les images des points cycliques dans l’algorithme d’auto-calibrage opérant dans l’espace projectif dual et fondé sur des équations linéaires. L’ensemble de ces contributions est finalement utilisé pour une application de suivi automatique de caméra utilisant des marqueurs formés par des couronnes concentriques (appelés CCTags), où il s’agit de calculer le mouvement tridimensionnel de la caméra dans la scène à partir d’une séquence vidéo. Ce type d’application est généralement utilisé dans l’industrie du cinéma ou de la télévision afin de produire des effets spéciaux. Le suivi de caméra proposé dans ce travail a été conçu pour proposer le meilleur compromis possible entre flexibilité d’utilisation et précision des résultats obtenus. / The thesis deals with the problem of 3D reconstruction of a rigid scene from a collection of views acquired by a digital camera. The problem addressed, referred as the Structure-from-Motion (SfM) problem, consists in computing the camera motion (including its trajectory) and the 3D characteristics of the scene based on 2D trajectories of imaged features through the collection. We propose theoretical foundations to extend some SfM paradigms in order to integrate real as well as complex imaged features as input data, and more especially imaged circular points. Circular points of a projective plane consist in a complex conjugate point-pair which is fixed under plane similarity ; thus endowing the plane with an Euclidean structure. We introduce the notion of circular markers which are planar markers that allows to compute, without any ambiguity, imaged circular points of their supporting plane in all views. Aside from providing a very “rich” Euclidean information, such features can be matched even if they are arbitrarily positioned on parallel planes thanks to their invariance under plane similarity ; thus increasing their visibility compared to natural features. We show how to benefit from this geometric property in solving the projective SfM problem via a rank-reduction technique, referred to as projective factorization, of the matrix whose entries are images of real, complex and/or circular features. One of the critical issues in such a SfM paradigm is the self-calibration problem, which consists in updating a projective reconstruction into an euclidean one. We explain how to use the euclidean information provided by imaged circular points in the self-calibration algorithm operating in the dual projective space and relying on linear equations. All these contributions are finally used in an automatic camera tracking application relying on markers made up of concentric circles (called C2Tags). The problem consists in computing the 3D camera motion based on a video sequence. This kind of application is generally used in the cinema or TV industry to create special effects. The camera tracking proposed in this work in designed in order to provide the best compromise between flexibility of use and accuracy. Points cycliques Reconstruction 3D Suivi de caméra Marqueur Détection 3D reconstruction Structure-from-Motion Camera self-calibration Fiducial marker detection Camera tracking
96	Characterization of Landslide Geometry and Movement Near Black Canyon City, Arizona January 2016 (has links) abstract: I investigate the Black Canyon City landslide (BCC landslide), a prominent deep-seated landslide located northeast of Black Canyon City, Arizona. Although the landslide does not appear to pose a significant hazard to structures, its prominent features and high topographic relief make it an excellent site to study the geologic setting under which such features develop. This study has the potential to contribute toward understanding the landscape evolution in similar geologic and topographic settings, and for characterizing the underlying structural processes of this deep-seated feature. We use field and remotely-based surface geology and geomorphological mapping to characterize the landslide geometry and its surface displacement. We use the Structure from Motion (SfM) method to generate a 0.2 m resolution digital elevation model and rectified ortho-photo imagery from unmanned aerial vehicle (UAV) - and balloon-based images and used them as the base map for our mapping. The ~0.6 km2 landslide is easily identified through remotely-sensed imagery and in the field because of the prominent east-west trending fractures defining its upper extensional portion. The landslide displaces a series of Early and Middle Miocene volcanic and sedimentary rocks. The main head scarp is ~600 m long and oriented E-W with some NW-SE oriented minor scarps. Numerous fractures varying from millimeters to meters in opening were identified throughout the landslide body (mostly with longitudinal orientation). The occurrence of a distinctive layer of dark reddish basalt presents a key displaced marker to estimate the long-term deformation of the slide mass. Using this marker, the total vertical displacement is estimated to be ~70 m, with maximum movement of ~95 m to the SE. This study indicates that the landslide motion is translational with a slight rotational character. We estimate the rate of the slide motion by resurvey of monuments on and off the slide, and examination of disturbed vegetation located along the fractures. The analysis indicates a slow integrated average landslide velocity of 10-60 mm/yr. The slide motion is probably driven during annual wet periods when increased saturation of the slide mass weakens the basal slip surface and the overall mass of the slide is increased. Results from our study suggest that the slide is stable and does not pose significant hazard for the surrounding area given no extreme changes in the environmental condition. Although the landslide is categorized as very slow (according to Cruden and Varnes, 1996), monitoring the landslide is still necessary. / Dissertation/Thesis / Masters Thesis Geological Sciences 2016 Geology Geological engineering Agisoft Photoscan Black Canyon City Landslide DEM Analyisis Landslide Geometry Structure from Motion Surface Displacement
97	Infrared image-based modeling and rendering Wretstam, Oskar January 2017 (has links) Image based modeling using visual images has undergone major development during the earlier parts of the 21th century. In this thesis a system for automated uncalibrated scene reconstruction using infrared images is implemented and tested. An automated reconstruction system could serve to simplify thermal inspection or as a demonstration tool. Thermal images will in general have lower resolution, less contrast and less high frequency content as compared to visual images. These characteristics of infrared images further complicates feature extraction and matching, key steps in the reconstruction process. In order to remedy the complication preprocessing methods are suggested and tested as well. Infrared modeling will also impose additional demands on the reconstruction as it is of importance to maintain thermal accuracy of the images in the product. Three main results are obtained from this thesis. Firstly, it is possible to obtain camera calibration and pose as well as a sparse point cloud reconstruction from an infrared image sequence using the suggested implementation. Secondly, correlation of thermal measurements from the images used to reconstruct three dimensional coordinates is presented and analyzed. Lastly, from the preprocessing evaluation it is concluded that the tested methods are not suitable. The methods will increase computational cost while improvements in the model are not proportional. / Bildbaserad modellering med visuella bilder har genomgått en stor utveckling under de tidigare delarna av 2000-talet. Givet en sekvens bestående av vanliga tvådimensionella bilder på en scen från olika perspektiv så är målet att rekonstruera en tredimensionell modell. I denna avhandling implementeras och testas ett system för automatiserad okalibrerad scenrekonstruktion från infraröda bilder. Okalibrerad rekonstruktion refererar till det faktum att parametrar för kameran, såsom fokallängd och fokus, är okända och enbart bilder används som indata till systemet. Ett stort användingsområde för värmekameror är inspektion. Temperaturskillnader i en bild kan indikera till exempel dålig isolering eller hög friktion. Om ett automatiserat system kan skapa en tredimensionell modell av en scen så kan det bidra till att förenkla inspektion samt till att ge en bättre överblick. Värmebilder kommer generellt att ha lägre upplösning, mindre kontrast och mindre högfrekvensinnehåll jämfört med visuella bilder. Dessa egenskaper hos infraröda bilder komplicerar extraktion och matchning av punkter i bilderna vilket är viktiga steg i rekonstruktionen. För att åtgärda komplikationen förbehandlas bilderna innan rekonstruktionen, ett urval av metoder för förbehandling har testats. Rekonstruktion med värmebilder kommer också att ställa ytterligare krav på rekonstruktionen, detta eftersom det är viktigt att bibehålla termisk noggrannhet från bilderna i modellen. Tre huvudresultat erhålls från denna avhandling. För det första är det möjligt att beräkna kamerakalibrering och position såväl som en gles rekonstruktion från en infraröd bildsekvens, detta med implementationen som föreslås i denna avhandling. För det andra presenteras och analyseras korrelationen för temperaturmätningar i bilderna som används för rekonstruktionen. Slutligen så visar den testade förbehandlingen inte en förbättring av rekonstruktionen som är propotionerlig med den ökade beräkningskomplexiteten. scene reconstruction structure from motion multi-view stereo image-based modeling infrared thermography infrared images computer vision
98	Implementation and evaluation of a 3D tracker / Implementation och utvärdering av en 3D tracker Robinson, Andreas January 2014 (has links) Many methods have been developed for visual tracking of generic objects. The vast majority of these assume the world is two-dimensional, either ignoring the third dimension or only dealing with it indirectly. This causes difficulties for the tracker when the target approaches or moves away from the camera, is occluded or moves out of the camera frame. Unmanned aerial vehicles (UAVs) are increasingly used in civilian applications and some of these will undoubtedly carry tracking systems in the future. As they move around, these trackers will encounter both scale changes and occlusions. To improve the tracking performance in these cases, the third dimension should be taken into account. This thesis extends the capabilities of a 2D tracker to three dimensions, with the assumption that the target moves on a ground plane. The position of the tracker camera is established by matching the video it produces to a sparse point-cloud map built with off-the-shelf structure-from-motion software. A target is tracked with a generic 2D tracker and subsequently positioned on the ground. Should the target disappear from view, its motion on the ground is predicted. In combination, these simple techniques are shown to improve the robustness of a tracking system on a moving platform under target scale changes and occlusions. 3D visual object tracking tracking unmanned aerial vehicle UAV structure from motion
99	3D structure estimation from image stream in urban environment / Estimation de la structure 3D d'un environnement urbain à partir d'un flux vidéo Nawaf, Mohamad Motasem 05 December 2014 (has links) Dans le domaine de la vision par ordinateur, l’estimation de la structure d’une scène 3D à partir d’images 2D constitue un problème fondamental. Parmi les applications concernées par cette problématique, nous nous sommes intéressés dans le cadre de cette thèse à la modélisation d’un environnement urbain. Nous nous sommes intéressés à la reconstruction de scènes 3D à partir d’images monoculaires générées par un véhicule en mouvement. Ici, plusieurs défis se posent à travers les différentes étapes de la chaine de traitement inhérente à la reconstruction 3D. L’un de ces défis vient du fait de l’absence de zones suffisamment texturées dans certaines scènes urbaines, d’où une reconstruction 3D (un nuage de points 3D) trop éparse. De plus, du fait du mouvement du véhicule, d’une image à l’autre il n’y a pas toujours un recouvrement suffisant entre différentes vues consécutives d’une même scène. Dans ce contexte, et ce afin de lever les verrous ci-dessus mentionnés, nous proposons d’estimer, de reconstruire, la structure d’une scène 3D par morceaux en se basant sur une hypothèse de planéité. Nous proposons plusieurs améliorations à la chaine de traitement associée à la reconstruction 3D. D’abord, afin de structurer, de représenter, la scène sous la forme d’entités planes nous proposons une nouvelle méthode de reconstruction 3D, basée sur le regroupement de pixels similaires (superpixel segmentation), qui à travers une représentation multi-échelle pondérée fusionne les informations de couleur et de mouvement. Cette méthode est basée sur l’estimation de la probabilité de discontinuités locales aux frontières des régions calculées à partir du gradient (gradientbased boundary probability estimation). Afin de prendre en compte l’incertitude liée à l’estimation du mouvement, une pondération par morceaux est appliquée à chaque pixel en fonction de cette incertitude. Cette méthode génère des regroupements de pixels (superpixels) non contraints en termes de taille et de forme. Pour certaines applications, telle que la reconstruction 3D à partir d’une séquence d’images, des contraintes de taille sont nécessaires. Nous avons donc proposé une méthode qui intègre à l’algorithme SLIC (Simple Linear Iterative Clustering) l’information de mouvement. L’objectif étant d’obtenir une reconstruction 3D plus dense qui estime mieux la structure de la scène. Pour atteindre cet objectif, nous avons aussi introduit une nouvelle distance qui, en complément de l’information de mouvement et de données images, prend en compte la densité du nuage de points. Afin d’augmenter la densité du nuage de points utilisé pour reconstruire la structure de la scène sous la forme de surfaces planes, nous proposons une nouvelle approche qui mixte plusieurs méthodes d’appariement et une méthode de flot optique dense. Cette méthode est basée sur un système de pondération qui attribue un poids pré-calculé par apprentissage à chaque point reconstruit. L’objectif est de contrôler l’impact de ce système de pondération, autrement dit la qualité de la reconstruction, en fonction de la précision de la méthode d’appariement utilisée. Pour atteindre cet objectif, nous avons appliqué un processus des moindres carrés pondérés aux données reconstruites pondérées par les calculés par apprentissage, qui en complément de la segmentation par morceaux de la séquence d’images, permet une meilleure reconstruction de la structure de la scène sous la forme de surfaces planes. Nous avons également proposé un processus de gestion des discontinuités locales aux frontières de régions voisines dues à des occlusions (occlusion boundaries) qui favorise la coplanarité et la connectivité des régions connexes. L’ensemble des modèles proposés permet de générer une reconstruction 3D dense représentative à la réalité de la scène. La pertinence des modèles proposés a été étudiée et comparée à l’état de l’art. Plusieurs expérimentations ont été réalisées afin de démontrer, d’étayer, la validité de notre approche / In computer vision, the 3D structure estimation from 2D images remains a fundamental problem. One of the emergent applications is 3D urban modelling and mapping. Here, we are interested in street-level monocular 3D reconstruction from mobile vehicle. In this particular case, several challenges arise at different stages of the 3D reconstruction pipeline. Mainly, lacking textured areas in urban scenes produces low density reconstructed point cloud. Also, the continuous motion of the vehicle prevents having redundant views of the scene with short feature points lifetime. In this context, we adopt the piecewise planar 3D reconstruction where the planarity assumption overcomes the aforementioned challenges.In this thesis, we introduce several improvements to the 3D structure estimation pipeline. In particular, the planar piecewise scene representation and modelling. First, we propose a novel approach that aims at creating 3D geometry respecting superpixel segmentation, which is a gradient-based boundary probability estimation by fusing colour and flow information using weighted multi-layered model. A pixel-wise weighting is used in the fusion process which takes into account the uncertainty of the computed flow. This method produces non-constrained superpixels in terms of size and shape. For the applications that imply a constrained size superpixels, such as 3D reconstruction from an image sequence, we develop a flow based SLIC method to produce superpixels that are adapted to reconstructed points density for better planar structure fitting. This is achieved by the mean of new distance measure that takes into account an input density map, in addition to the flow and spatial information. To increase the density of the reconstructed point cloud used to performthe planar structure fitting, we propose a new approach that uses several matching methods and dense optical flow. A weighting scheme assigns a learned weight to each reconstructed point to control its impact to fitting the structure relative to the accuracy of the used matching method. Then, a weighted total least square model uses the reconstructed points and learned weights to fit a planar structure with the help of superpixel segmentation of the input image sequence. Moreover, themodel handles the occlusion boundaries between neighbouring scene patches to encourage connectivity and co-planarity to produce more realistic models. The final output is a complete dense visually appealing 3Dmodels. The validity of the proposed approaches has been substantiated by comprehensive experiments and comparisons with state-of-the-art methods Reconstruction 3D Regroupement de pixels Structure du mouvement Regroupement Modélisation 3D Reconstruction urbaine Apprentissage automatique 3D reconstruction Superpixel segmentation Structure from motion Clustering 3D modeling Urban reconstruction Machine learning
100	Learning objects model and context for recognition and localisation / Apprentissage de modèles et contextes d'objets pour la reconnaissance et la localisation Manfredi, Guido 18 September 2015 (has links) Cette thèse traite des problèmes de modélisation, reconnaissance, localisation et utilisation du contexte pour la manipulation d'objets par un robot. Le processus de modélisation se divise en quatre composantes : le système réel, les données capteurs, les propriétés à reproduire et le modèle. En spécifiant chacune des ces composantes, il est possible de définir un processus de modélisation adapté au problème présent, la manipulation d'objets par un robot. Cette analyse mène à l'adoption des descripteurs de texture locaux pour la modélisation. La modélisation basée sur des descripteurs de texture locaux a été abordé dans de nombreux travaux traitant de structure par le mouvement (SfM) ou de cartographie et localisation simultanée (SLAM). Les méthodes existantes incluent Bundler, Roboearth et 123DCatch. Pourtant, aucune de ces méthodes n'a recueilli le consensus. En effet, l'implémentation d'une approche similaire montre que ces outils sont difficiles d'utilisation même pour des utilisateurs experts et qu'ils produisent des modèles d'une haute complexité. Cette complexité est utile pour fournir un modèle robuste aux variations de point de vue. Il existe deux façons pour un modèle d'être robuste : avec le paradigme des vues multiple ou celui des descripteurs forts. Dans le paradigme des vues multiples, le modèle est construit à partir d'un grand nombre de points de vue de l'objet. Le paradigme des descripteurs forts compte sur des descripteurs résistants aux changements de points de vue. Les expériences réalisées montrent que des descripteurs forts permettent d'utiliser un faible nombre de vues, ce qui résulte en un modèle simple. Ces modèles simples n'incluent pas tout les point de vus existants mais les angles morts peuvent être compensés par le fait que le robot est mobile et peut adopter plusieurs points de vue. En se basant sur des modèles simples, il est possible de définir des méthodes de modélisation basées sur des images seules, qui peuvent être récupérées depuis Internet. A titre d'illustration, à partir d'un nom de produit, il est possible de récupérer des manières totalement automatiques des images depuis des magasins en ligne et de modéliser puis localiser les objets désirés. Même avec une modélisation plus simple, dans des cas réel ou de nombreux objets doivent être pris en compte, il se pose des problèmes de stockage et traitement d'une telle masse de données. Cela se décompose en un problème de complexité, il faut traiter de nombreux modèles rapidement, et un problème d'ambiguïté, des modèles peuvent se ressembler. L'impact de ces deux problèmes peut être réduit en utilisant l'information contextuelle. Le contexte est toute information non issue des l'objet lui même et qui aide a la reconnaissance. Ici deux types de contexte sont abordés : le lieu et les objets environnants. Certains objets se trouvent dans certains endroits particuliers. En connaissant ces liens lieu/objet, il est possible de réduire la liste des objets candidats pouvant apparaître dans un lieu donné. Par ailleurs l'apprentissage du lien lieu/objet peut être fait automatiquement par un robot en modélisant puis explorant un environnement. L'information appris peut alors être fusionnée avec l'information visuelle courante pour améliorer la reconnaissance. Dans les cas des objets environnants, un objet peut souvent apparaître au cotés d'autres objets, par exemple une souris et un clavier. En connaissant la fréquence d'apparition d'un objet avec d'autres objets, il est possible de réduire la liste des candidats lors de la reconnaissance. L'utilisation d'un Réseau de Markov Logique est particulièrement adaptée à la fusion de ce type de données. Cette thèse montre la synergie de la robotique et du contexte pour la modélisation, reconnaissance et localisation d'objets. / This Thesis addresses the modeling, recognition, localization and use of context for objects manipulation by a robot. We start by presenting the modeling process and its components: the real system, the sensors' data, the properties to reproduce and the model. We show how, by specifying each of them, one can define a modeling process adapted to the problem at hand, namely object manipulation by a robot. This analysis leads us to the adoption of local textured descriptors for object modeling. Modeling with local textured descriptors is not a new concept, it is the subject of many Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) works. Existing methods include bundler, roboearth modeler and 123DCatch. Still, no method has gained widespread adoption. By implementing a similar approach, we show that they are hard to use even for expert users and produce highly complex models. Such complex techniques are necessary to guaranty the robustness of the model to view point change. There are two ways to handle the problem: the multiple views paradigm and the robust features paradigm. The multiple views paradigm advocate in favor of using a large number of views of the object. The robust feature paradigm relies on robust features able to resist large view point changes. We present a set of experiments to provide an insight into the right balance between both. By varying the number of views and using different features we show that small and fast models can provide robustness to view point changes up to bounded blind spots which can be handled by robotic means. We propose four different methods to build simple models from images only, with as little a priori information as possible. The first one applies to planar or piecewise planar objects and relies on homographies for localization. The second approach is applicable to objects with simple geometry, such as cylinders or spheres, but requires many measures on the object. The third method requires the use of a calibrated 3D sensor but no additional information. The fourth technique doesn't need a priori information at all. We apply this last method to autonomous grocery objects modeling. From images automatically retrieved from a grocery store website, we build a model which allows recognition and localization for tracking. Even using light models, real situations ask for numerous object models to be stored and processed. This poses the problems of complexity, processing multiple models quickly, and ambiguity, distinguishing similar objects. We propose to solve both problems by using contextual information. Contextual information is any information helping the recognition which is not directly provided by sensors. We focus on two contextual cues: the place and the surrounding objects. Some objects are mainly found in some particular places. By knowing the current place, one can restrict the number of possible identities for a given object. We propose a method to autonomously explore a previously labeled environment and establish a correspondence between objects and places. Then this information can be used in a cascade combining simple visual descriptors and context. This experiment shows that, for some objects, recognition can be achieved with as few as two simple features and the location as context. The objects surrounding a given object can also be used as context. Objects like a keyboard, a mouse and a monitor are often close together. We use qualitative spatial descriptors to describe the position of objects with respect to their neighbors. Using a Markov Logic Network, we learn patterns in objects disposition. This information can then be used to recognize an object when surrounding objects are already identified. This Thesis stresses the good match between robotics, context and objects recognition. Modélisation Reconnaissance Localisation Contexte Cooccurrence d'objets Réseaux logiques de Markov Structure par le mouvement SLAM Structure from motion Object modeling Object recognition Object localization Context learning Robotics

Search results