Global ETD Search

1	A Comparative Study On Pose Estimation Algorithms Using Visual Data Cetinkaya, Guven 01 February 2012 (has links) (PDF) Computation of the position and orientation of an object with respect to a camera from its images is called pose estimation problem. Pose estimation is one of the major problems in computer vision, robotics and photogrammetry. Object tracking, object recognition, self-localization of robots are typical examples for the use of pose estimation. Determining the pose of an object from its projections requires 3D model of an object in its own reference system, the camera parameters and 2D image of the object. Most of the pose estimation algorithms require the correspondences between the 3D model points of the object and 2D image points. In this study, four well-known pose estimation algorithms requiring the 2D-3D correspondences to be known a priori / namely, Orthogonal Iterations, POSIT, DLT and Efficient PnP are compared. Moreover, two other well-known algorithms that solve the correspondence and pose problems simultaneously / Soft POSIT and Blind- PnP are also compared in the scope of this thesis study. In the first step of the simulations, synthetic data is formed using a realistic motion scenario and the algorithms are compared using this data. In the next step, real images captured by a calibrated camera for an object with known 3D model are exploited. The simulation results indicate that POSIT algorithm performs the best among the algorithms requiring point correspondences. Another result obtained from the experiments is that Soft-POSIT algorithm can be considered to perform better than Blind-PnP algorithm. TK Electronics 7800-8360
2	Steps towards the object semantic hierarchy Xu, Changhai, 1977- 17 November 2011 (has links) An intelligent robot must be able to perceive and reason robustly about its world in terms of objects, among other foundational concepts. The robot can draw on rich data for object perception from continuous sensory input, in contrast to the usual formulation that focuses on objects in isolated still images. Additionally, the robot needs multiple object representations to deal with different tasks and/or different classes of objects. We propose the Object Semantic Hierarchy (OSH), which consists of multiple representations with different ontologies. The OSH factors the problems of object perception so that intermediate states of knowledge about an object have natural representations, with relatively easy transitions from less structured to more structured representations. Each layer in the hierarchy builds an explanation of the sensory input stream, in terms of a stochastic model consisting of a deterministic model and an unexplained "noise" term. Each layer is constructed by identifying new invariants from the previous layer. In the final model, the scene is explained in terms of constant background and object models, and low-dimensional dynamic poses of the observer and objects. The OSH contains two types of layers: the Object Layers and the Model Layers. The Object Layers describe how the static background and each foreground object are individuated, and the Model Layers describe how the model for the static background or each foreground object evolves from less structured to more structured representations. Each object or background model contains the following layers: (1) 2D object in 2D space (2D2D): a set of constant 2D object views, and the time-variant 2D object poses, (2) 2D object in 3D space (2D3D): a collection of constant 2D components, with their individual time-variant 3D poses, and (3) 3D object in 3D space (3D3D): the same collection of constant 2D components but with invariant relations among their 3D poses, and the time-variant 3D pose of the object as a whole. In building 2D2D object models, a fundamental problem is to segment out foreground objects in the pixel-level sensory input from the background environment, where motion information is an important cue to perform the segmentation. Traditional approaches for moving object segmentation usually appeal to motion analysis on pure image information without exploiting the robot's motor signals. We observe, however, that the background motion (from the robot's egocentric view) has stronger correlation to the robot's motor signals than the motion of foreground objects. Based on this observation, we propose a novel approach to segmenting moving objects by learning homography and fundamental matrices from motor signals. In building 2D3D and 3D3D object models, estimating camera motion parameters plays a key role. We propose a novel method for camera motion estimation that takes advantage of both planar features and point features and fuses constraints from both homography and essential matrices in a single probabilistic framework. Using planar features greatly improves estimation accuracy over using point features only, and with the help of point features, the solution ambiguity from a planar feature is resolved. Compared to the two classic approaches that apply the constraint of either homography or essential matrix, the proposed method gives more accurate estimation results and avoids the drawbacks of the two approaches. / text Object semantic hierarchy 3D object model Object tracking Object segmentation Motion segmentation 3D pose estimation
3	3D pose estimation of flying animals in multi-view video datasets Breslav, Mikhail 04 December 2016 (has links) Flying animals such as bats, birds, and moths are actively studied by researchers wanting to better understand these animals’ behavior and flight characteristics. Towards this goal, multi-view videos of flying animals have been recorded both in lab- oratory conditions and natural habitats. The analysis of these videos has shifted over time from manual inspection by scientists to more automated and quantitative approaches based on computer vision algorithms. This thesis describes a study on the largely unexplored problem of 3D pose estimation of flying animals in multi-view video data. This problem has received little attention in the computer vision community where few flying animal datasets exist. Additionally, published solutions from researchers in the natural sciences have not taken full advantage of advancements in computer vision research. This thesis addresses this gap by proposing three different approaches for 3D pose estimation of flying animals in multi-view video datasets, which evolve from successful pose estimation paradigms used in computer vision. The first approach models the appearance of a flying animal with a synthetic 3D graphics model and then uses a Markov Random Field to model 3D pose estimation over time as a single optimization problem. The second approach builds on the success of Pictorial Structures models and further improves them for the case where only a sparse set of landmarks are annotated in training data. The proposed approach first discovers parts from regions of the training images that are not annotated. The discovered parts are then used to generate more accurate appearance likelihood terms which in turn produce more accurate landmark localizations. The third approach takes advantage of the success of deep learning models and adapts existing deep architectures to perform landmark localization. Both the second and third approaches perform 3D pose estimation by first obtaining accurate localization of key landmarks in individual views, and then using calibrated cameras and camera geometry to reconstruct the 3D position of key landmarks. This thesis shows that the proposed algorithms generate first-of-a-kind and leading results on real world datasets of bats and moths, respectively. Furthermore, a variety of resources are made freely available to the public to further strengthen the connection between research communities. Computer science 3D pose estimation Computer vision Flying animals Hawkmoth Landmark annotation Localization
4	Enriching Remote Labs with Computer Vision and Drones / Enrichir les laboratoires distants grâce à la vision par ordinateur avec drone. Khattar, Fawzi 13 December 2018 (has links) Avec le progrès technologique, de nouvelles technologies sont en cours de développement afin de contribuer à une meilleure expérience dans le domaine de l’éducation. En particulier, les laboratoires distants constituent un moyen intéressant et pratique qui peut motiver les étudiants à apprendre. L'étudiant peut à tout moment, et de n'importe quel endroit, accéder au laboratoire distant et faire son TP (travail pratique). Malgré les nombreux avantages, les technologies à distance dans l’éducation créent une distance entre l’étudiant et l’enseignant. Les élèves peuvent avoir des difficultés à faire le TP si aucune intervention appropriée ne peut être prise pour les aider. Dans cette thèse, nous visons à enrichir un laboratoire électronique distant conçu pour les étudiants en ingénierie et appelé «LaboREM» (pour remote laboratory) de deux manières: tout d'abord, nous permettons à l'étudiant d'envoyer des commandes de haut niveau à un mini-drone disponible dans le laboratoire distant. L'objectif est d'examiner les faces-avant des instruments de mesure électroniques, à l'aide de la caméra intégrée au drone. De plus, nous autorisons la communication élève-enseignant à distance à l'aide du drone, au cas où un enseignant serait présent dans le laboratoire distant. Enfin, le drone doit revenir pour atterrir sur la plate-forme de recharge automatique des batteries, quand la mission est terminée. Nous proposons aussi un système automatique pour estimer l'état de l'étudiant (frustré / concentré..) afin de prendre les interventions appropriées pour assurer un bon déroulement du TP distant. Par exemple, si l'élève a des difficultés majeures, nous pouvons lui donner des indications ou réduire le niveau de difficulté de l’exercice. Nous proposons de faire cela en utilisant des signes visuels (estimation de la pose de la tête et analyse de l'expression faciale). De nombreuses évidences sur l'état de l'étudiant peuvent être acquises, mais elles sont incomplètes, parfois inexactes et ne couvrent pas tous les aspects de l'état de l'étudiant. C'est pourquoi nous proposons dans cette thèse de fusionner les preuves en utilisant la théorie de Dempster-Shafer qui permet la fusion de preuves incomplètes. / With the technological advance, new learning technologies are being developed in order to contribute to better learning experience. In particular, remote labs constitute an interesting and a practical way that can motivate nowadays students to learn. The student can at any time, and from anywhere, access the remote lab and do his lab-work. Despite many advantages, remote technologies in education create a distance between the student and the teacher. Without the presence of a teacher, students can have difficulties, if no appropriate interventions can be taken to help them. In this thesis, we aim to enrich an existing remote electronic lab made for engineering students called “LaboREM” (for remote Laboratory) in two ways: first we enable the student to send high level commands to a mini-drone available in the remote lab facility. The objective is to examine the front panels of electronic measurement instruments, by the camera embedded on the drone. Furthermore, we allow remote student-teacher communication using the drone, in case there is a teacher present in the remote lab facility. Finally, the drone has to go back home when the mission is over to land on a platform for automatic recharge of the batteries. Second, we propose an automatic system that estimates the affective state of the student (frustrated/ confused/ flow..) in order to take appropriate interventions to ensure good learning outcomes. For example, if the student is having major difficulties we can try to give him hints or reduce the difficulty level. We propose to do this by using visual cues (head pose estimation and facial expression analysis). Many evidences on the state of the student can be acquired, however these evidences are incomplete, sometimes inaccurate, and do not cover all the aspects of the state of the student alone. This is why we propose to fuse evidences using the theory of Dempster-Shafer that allows the fusion of incomplete evidence. Vision par ordinateur Drone Laboratoire distant Théorie de l’évidence Estimation de pose 3D Computer Vision Drones Remote Labs Evidence theory 3D pose estimation 004.6
5	3D POSE ESTIMATION IN THE CONTEXT OF GRIP POSITION FOR PHRI Norman, Jacob January 2021 (has links) For human-robot interaction with the intent to grip a human arm, it is necessary that the ideal gripping location can be identified. In this work, the gripping location is situated on the arm and thus it can be extracted using the position of the wrist and elbow joints. To achieve this human pose estimation is proposed as there exist robust methods that work both in and outside of lab environments. One such example is OpenPose which thanks to the COCO and MPII datasets has recorded impressive results in a variety of different scenarios in real-time. However, most of the images in these datasets are taken from a camera mounted at chest height on people that for the majority of the images are oriented upright. This presents the potential problem that prone humans which are the primary focus of this project can not be detected. Especially if seen from an angle that makes the human appear upside down in the camera frame. To remedy this two different approaches were tested, both aimed at creating a rotation-invariant 2D pose estimation method. The first method rotates the COCO training data in an attempt to create a model that can find humans regardless of orientation in the image. The second approach adds a RotationNet as a preprocessing step to correctly orient the images so that OpenPose can be used to estimate the 2D pose before rotating back the resulting skeletons. 3D pose estimation human pose estimation pose estimation rotation-invariant Computer Sciences Datavetenskap (datalogi)
6	3D Pose estimation of continuously deformable instruments in robotic endoscopic surgery / Mesure par vision de la position d'instruments médicaux flexibles pour la chirurgie endoscopique robotisée Cabras, Paolo 24 February 2016 (has links) Connaître la position 3D d’instruments robotisés peut être très utile dans le contexte chirurgical. Nous proposons deux méthodes automatiques pour déduire la pose 3D d’un instrument avec une unique section pliable et équipé avec des marqueurs colorés, en utilisant uniquement les images fournies par la caméra monoculaire incorporée dans l'endoscope. Une méthode basée sur les graphes permet segmenter les marqueurs et leurs coins apparents sont extraits en détectant la transition de couleur le long des courbes de Bézier qui modélisent les points du bord. Ces primitives sont utilisées pour estimer la pose 3D de l'instrument en utilisant un modèle adaptatif qui prend en compte les jeux mécaniques du système. Pour éviter les limites de cette approche dérivants des incertitudes sur le modèle géométrique, la fonction image-position-3D peut être appris selon un ensemble d’entrainement. Deux techniques ont été étudiées et améliorées : réseau des fonctions à base radiale avec noyaux gaussiens et une régression localement pondérée. Les méthodes proposées sont validées sur une cellule expérimentale robotique et sur des séquences in-vivo. / Knowing the 3D position of robotized instruments can be useful in surgical context for e.g. their automatic control or gesture guidance. We propose two methods to infer the 3D pose of a single bending section instrument equipped with colored markers using only the images provided by the monocular camera embedded in the endoscope. A graph-based method is used to segment the markers. Their corners are extracted by detecting color transitions along Bézier curves fitted on edge points. These features are used to estimate the 3D pose of the instrument using an adaptive model that takes into account the mechanical plays of the system. Since this method can be affected by model uncertainties, the image-to-3d function can be learned according to a training set. We opted for two techniques that have been improved : Radial Basis Function Network with Gaussian kernel and Locally Weighted Projection. The proposed methods are validated on a robotic experimental cell and in in-vivo sequences. Robotique medicale flexible Apprentissage Medical flexible robotics Single-Image-Based 3D pose estimation In-vivo flexible instrument segmentation Learning regression 629.89 006.6
7	Extraction de comportements reproductibles en avatar virtuel Dare, Kodjine 10 1900 (has links) Face à une image représentant une personne, nous (les êtres humains) pouvons visualiser les différentes parties de la personne en trois dimensions (tridimensionnellement – 3D) malgré l'aspect bidimensionnel (2D) de l'image. Cette compétence est maîtrisée grâce à des années d'analyse des humains. Bien que cette estimation soit facilement réalisable par les êtres humains, elle peut être difficile pour les machines. Dans ce mémoire, nous décrivons une approche qui vise à estimer des poses à partir de vidéos dans le but de reproduire les mouvements observés par un avatar virtuel. Nous poursuivons en particulier deux objectifs dans notre travail. Tout d'abord, nous souhaitons extraire les coordonnées d’un individu dans une vidéo à l’aide de méthodes 2D puis 3D. Dans le second objectif, nous explorons la reconstruction d'un avatar virtuel en utilisant les coordonnées 3D de façon à transférer les mouvements humains vers l'avatar. Notre approche qui consiste à compléter l’estimation des coordonnées 3D par des coordonnes 2D permettent d’obtenir de meilleurs résultats que les méthodes existantes. Finalement nous appliquons un transfert des positions par image sur le squelette d'un avatar virtuel afin de reproduire les mouvements extraits de la vidéo. / Given an image depicting a person, we (human beings) can visualize the different parts of the person in three dimensions despite the two-dimensional aspect of the image. This perceptual skill is mastered through years of analyzing humans. While this estimation is easily achievable for human beings, it can be challenging for machines. 3D human pose estimation uses a 3D skeleton to represent the human body posture. In this thesis, we describe an approach that aims at estimating poses from video with the objective of reproducing the observed movements by a virtual avatar. We aim two main objectives in our work. First, we achieve the extraction of initial body parts coordinates in 2D using a method that predicts joint locations by part affinities (PAF). Then, we estimate 3D body parts coordinates based on a human full 3D mesh reconstruction approach supplemented by the previously estimated 2D coordinates. Secondly, we explore the reconstruction of a virtual avatar using the extracted 3D coordinates with the prospect to transfer human movements towards the animated avatar. This would allow to extract the behavioral dynamics of a human. Our approach consists of multiple subsequent stages that show better results in the estimation and extraction than similar solutions due to this supplement of 2D coordinates. With the final extracted coordinates, we apply a transfer of the positions (per frame) to the skeleton of a virtual avatar in order to reproduce the movements extracted from the video. Extraction de comportements Avatar virtuel Simulation d’émotions Vidéo Estimation pose 3D Reproduction de comportements Behavior extraction Virtual avatar Emotion simulation 3D pose estimation Behavior reproduction
8	Modulating Depth Map Features to Estimate 3D Human Pose via Multi-Task Variational Autoencoders / Modulerande djupkartfunktioner för att uppskatta människans ställning i 3D med multi-task-variationsautoenkoder Moerman, Kobe January 2023 (has links) Human pose estimation (HPE) constitutes a fundamental problem within the domain of computer vision, finding applications in diverse fields like motion analysis and human-computer interaction. This paper introduces innovative methodologies aimed at enhancing the accuracy and robustness of 3D joint estimation. Through the integration of Variational Autoencoders (VAEs), pertinent information is extracted from depth maps, even in the presence of inevitable image-capturing inconsistencies. This concept is enhanced through the introduction of noise to the body or specific regions surrounding key joints. The deliberate introduction of noise to these areas enables the VAE to acquire a robust representation that captures authentic pose-related patterns. Moreover, the introduction of a localised mask as a constraint in the loss function ensures the model predominantly relies on pose-related cues while disregarding potential confounding factors that may hinder the compact representation of accurate human pose information. Delving into the latent space modulation further, a novel model architecture is devised, joining a VAE and fully connected network into a multi-task joint training objective. In this framework, the VAE and regressor harmoniously influence the latent representations for accurate joint detection and localisation. By combining the multi-task model with the loss function constraint, this study attains results that compete with state-of-the-art techniques. These findings underscore the significance of leveraging latent space modulation and customised loss functions to address challenging human poses. Additionally, these novel methodologies pave the way for future explorations and provide prospects for advancing HPE. Subsequent research endeavours may optimising these techniques, evaluating their performance across diverse datasets, and exploring potential extensions to unravel further insights and advancements in the field. / Human pose estimation (HPE) är ett grundläggande problem inom datorseende och används inom områden som rörelseanalys och människa-datorinteraktion. I detta arbete introduceras innovativa metoder som syftar till att förbättra noggrannheten och robustheten i 3D-leduppskattning. Genom att integrera variationsautokodare (eng. variational autoencoder, VAE) extraheras relevant information från djupkartor, trots närvaro av inkonsekventa avvikelser i bilden. Dessa avvikelser förstärks genom att applicera brus på kroppen eller på specifika regioner som omger viktiga leder. Det avsiktliga införandet av brus i dessa områden gör det möjligt för VAE att lära sig en robust representation som fångar autentiska poseringsrelaterade mönster. Dessutom införs en lokaliserad mask som en begränsning i förlustfunktionen, vilket säkerställer att modellen främst förlitar sig på poseringsrelaterade signaler samtidigt som potentiella störande faktorer som hindrar den kompakta representationen av korrekt mänsklig poseringsinformation bortses ifrån. Genom att fördjupa sig ytterligare i den latenta rumsmoduleringen har en ny modellarkitektur tagits fram som förenar en VAE och ett fullständigt anslutet nätverk i en fleruppgiftsmodell. I detta ramverk påverkar VAE och det fullständigt ansluta nätverket de latenta representationerna på ett harmoniskt sätt för att uppnå korrekt leddetektering och lokalisering. Genom att kombinera fleruppgiftsmodellen med förlustfunktionsbegränsningen uppnår denna studie resultat som konkurrerar med toppmoderna tekniker. Dessa resultat understryker betydelsen av att utnyttja latent rymdmodulering och anpassade förlustfunktioner för att hantera utmanande mänskliga poser. Dessutom banar dessa nya metoder väg för framtida utveckling inom uppskattning av HPE. Efterföljande forskningsinsatser kan optimera dessa tekniker, utvärdera deras prestanda över olika datamängder och utforska potentiella tillägg för att avslöja ytterligare insikter och framsteg inom området. 3D pose estimation Joint landmarks Variational autoencoder Multi-task model Loss discrimination Latent-space modulation Depth map 3D-positionsuppskattning Gemensamma landmärken Variationell autoencoder Multitask-modell Förlustdiskriminering Latent-space-modulering Djupkarta Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Computer and Information Sciences Data- och informationsvetenskap

Search results