Global ETD Search

101	Learning to Predict Dense Correspondences for 6D Pose Estimation Brachmann, Eric 17 January 2018 (has links) Object pose estimation is an important problem in computer vision with applications in robotics, augmented reality and many other areas. An established strategy for object pose estimation consists of, firstly, finding correspondences between the image and the object’s reference frame, and, secondly, estimating the pose from outlier-free correspondences using Random Sample Consensus (RANSAC). The first step, namely finding correspondences, is difficult because object appearance varies depending on perspective, lighting and many other factors. Traditionally, correspondences have been established using handcrafted methods like sparse feature pipelines. In this thesis, we introduce a dense correspondence representation for objects, called object coordinates, which can be learned. By learning object coordinates, our pose estimation pipeline adapts to various aspects of the task at hand. It works well for diverse object types, from small objects to entire rooms, varying object attributes, like textured or texture-less objects, and different input modalities, like RGB-D or RGB images. The concept of object coordinates allows us to easily model and exploit uncertainty as part of the pipeline such that even repeating structures or areas with little texture can contribute to a good solution. Although we can train object coordinate predictors independent of the full pipeline and achieve good results, training the pipeline in an end-to-end fashion is desirable. It enables the object coordinate predictor to adapt its output to the specificities of following steps in the pose estimation pipeline. Unfortunately, the RANSAC component of the pipeline is non-differentiable which prohibits end-to-end training. Adopting techniques from reinforcement learning, we introduce Differentiable Sample Consensus (DSAC), a formulation of RANSAC which allows us to train the pose estimation pipeline in an end-to-end fashion by minimizing the expectation of the final pose error. info:eu-repo/classification/ddc/004 ddc:004
102	Hypothesis Generation for Object Pose Estimation From local sampling to global reasoning Michel, Frank 14 February 2019 (has links) Pose estimation has been studied since the early days of computer vision. The task of object pose estimation is to determine the transformation that maps an object from it's inherent coordinate system into the camera-centric coordinate system. This transformation describes the translation of the object relative to the camera and the orientation of the object in three dimensional space. The knowledge of an object's pose is a key ingredient in many application scenarios like robotic grasping, augmented reality, autonomous navigation and surveillance. A general estimation pipeline consists of the following four steps: extraction of distinctive points, creation of a hypotheses pool, hypothesis verification and, finally, the hypotheses refinement. In this work, we focus on the hypothesis generation process. We show that it is beneficial to utilize geometric knowledge in this process. We address the problem of hypotheses generation of articulated objects. Instead of considering each object part individually we model the object as a kinematic chain. This enables us to use the inner-part relationships when sampling pose hypotheses. Thereby we only need K correspondences for objects consisting of K parts. We show that applying geometric knowledge about part relationships improves estimation accuracy under severe self-occlusion and low quality correspondence predictions. In an extension we employ global reasoning within the hypotheses generation process instead of sampling 6D pose hypotheses locally. We therefore formulate a Conditional-Random-Field operating on the image as a whole inferring those pixels that are consistent with the 6D pose. Within the CRF we use a strong geometric check that is able to assess the quality of correspondence pairs. We show that our global geometric check improves the accuracy of pose estimation under heavy occlusion. info:eu-repo/classification/ddc/004 ddc:004
103	Pose Estimation using Implicit Functions and Uncertainty in 3D Blomstedt, Frida January 2023 (has links) Human pose estimation in 3D is a large area within computer vision, with many application areas. A common approach is to first estimate the pose in 2D, resulting in a confidence heatmap, and then estimate the 3D pose using the most likely estimations in 2D. This may, however, cause problems in cases where pose estimates are more uncertain and the estimation of one point is far from the true position, for example when a limb is occluded. This thesis adapts the method Neural Radiance Fields (NeRF) to 2D confidence heatmaps in order to create an implicit representation of the uncertainty in 3D, thus attempting to make use of as much information in 2D as possible. The adapted method was evaluated on the Human3.6M dataset, and results show that this method outperforms a simple triangulation baseline, especially when the estimation in 2D is far from the true pose. pose estimation neural radiance fields nerf computer vision machine learning
104	3D-Reconstruction of the Common Murre / 3D-Rekonstruering av Sillgrissla Hägerlind, Johannes January 2023 (has links) Automatic 3D reconstruction of birds can aid researchers in studying their behavior. Recently there has been an attempt to reconstruct a variety of birds from single-view images. However, the common murre's appearance is different from the birds that have been studied. Moreover, recent studies have focused on side views. This thesis studies the 3D reconstruction of the common murre from single-view top-view images. A template mesh is first optimized to fit a 3D scan. Then the result is used to optimize a species-specific mean from side-view images annotated with keypoints and silhouettes. The resulting mean mesh is used to initialize the optimization for top-down images. Using a mask loss, a pose prior loss, and a bone length loss that uses a mean vector from the side-view images improves the 3D reconstruction as rated by humans. Furthermore, the intersection over union (IoU) and percentage of correct keypoint (PCK), although used by other authors, are insufficient in a single-view top-view setting. pose estimation shape estimation 3D reconstruction articulated mesh bird common murre
105	Polarimetric Imagery for Object Pose Estimation Siefring, Matthew D. 15 May 2023 (has links) No description available. Electrical Engineering Optics Polarimetric Imagery visible-spectrum deep-learning object pose estimation CNN late-fusion Stokes-products dataset
106	Scanning Laser Registration and Structural Energy Density Based Active Structural Acoustic Control Manwill, Daniel Alan 17 December 2010 (has links) (PDF) To simplify the measurement of energy-based structural metrics, a general registration process for the scanning laser doppler vibrometer (SLDV) has been developed. Existing registration techniques, also known as pose estimation or position registration, suffer from mathematical complexity, instrument specificity, and the need for correct optimization initialization. These difficulties have been addressed through development of a general linear laser model and hybrid registration algorithm. These are applicable to any SLDV and allow the registration problem to be solved using straightforward mathematics. Additionally, the hybrid registration algorithm eliminates the need for correct optimization initialization by separating the optimization process from solution selection. The effectiveness of this approach is demonstrated through simulated application and by validation measurements performed on a specially prepared pipe. To increase understanding of the relationships between structural energy metrics and the acoustic response, the use of structural energy density (SED) in active structural acoustic control (ASAC) has also been studied. A genetic algorithm and other simulations were used to determine achievable reduction in acoustic radiation, characterize control system design, and compare SED-based control with the simpler velocity-based control. Using optimized sensor and actuator placements at optimally excited modal frequencies, attenuation of net acoustic intensity was proportional to attenuation of SED. At modal and non-modal frequencies, optimal SED-based ASAC system design is guided by establishing general symmetry between the structural disturbing force and the SED sensor and control actuator. Using fixed sensor and actuator placement, SED-based control has been found to provide superior performance to single point velocity control and very comparable performance to two-point velocity control. Its greatest strength is that it rarely causes unwanted amplifications of large amplitude when properly designed. Genetic algorithm simulations of SED-based ASAC indicated that optimal control effectiveness is obtained when sensors and actuators function in more than one role. For example, an actuator can be placed to simultaneously reduce structural vibration amplitude and reshape the response such that it radiates less efficiently. These principles can be applied to the design of any type of ASAC system. Daniel Manwill scanning laser doppler vibrometer registration pose estimation active structural acoustic control structural vibration acoustics genetic algorithm Mechanical Engineering
107	Synthetic Data Generation for 6D Object Pose and Grasping Estimation Martínez González, Pablo 16 March 2023 (has links) Teaching a robot how to behave so it becomes completely autonomous is not a simple task. When robotic systems become truly intelligent, interactions with them will feel natural and easy, but nothing could be further from truth. Make a robot understand its surroundings is a huge task that the computer vision field tries to address, and deep learning techniques are bringing us closer. But at the cost of the data. Synthetic data generation is the process of generating artificial data that is used to train machine learning models. This data is generated using computer algorithms and simulations, and is designed to resemble real-world data as closely as possible. The use of synthetic data has become increasingly popular in recent years, particularly in the field of deep learning, due to the shortage of high-quality annotated real-world data and the high cost of collecting it. For that reason, in this thesis we are addressing the task of facilitating the generation of synthetic data with the creation of a framework which leverages advances in modern rendering engines. In this context, the generated synthetic data can be used to train models for tasks such as 6D object pose estimation or grasp estimation. 6D object pose estimation refers to the problem of determining the position and orientation of an object in 3D space, while grasp estimation involves predicting the position and orientation of a robotic hand or gripper that can be used to pick up and manipulate the object. These are important tasks in robotics and computer vision, as they enable robots to perform complex manipulation and grasping tasks. In this work we propose a way of extracting grasping information from hand-object interactions in virtual reality, so that synthetic data can also boost research in that area. Finally, we use this synthetically generated data to test the proposal of applying 6D object pose estimation architectures to grasping region estimation. This idea is based on both problems sharing several underlying concepts such as object detection and orientation. / Enseñar a un robot a ser completamente autónomo no es tarea fácil. Cuando los sistemas robóticos sean realmente inteligentes, las interacciones con ellos parecerán naturales y fáciles, pero nada más lejos de la realidad. Hacer que un robot comprenda y asimile su entorno es una difícil cruzada que el campo de la visión por ordenador intenta abordar, y las técnicas de aprendizaje profundo nos están acercando al objetivo. Pero el precio son los datos. La generación de datos sintéticos es el proceso de generar datos artificiales que se utilizan para entrenar modelos de aprendizaje automático. Estos datos se generan mediante algoritmos informáticos y simulaciones, y están diseñados para parecerse lo más posible a los datos del mundo real. El uso de datos sintéticos se ha vuelto cada vez más popular en los últimos años, especialmente en el campo del aprendizaje profundo, debido a la escasez de datos reales anotados de alta calidad y al alto coste de su recopilación. Por ello, en esta tesis abordamos la tarea de facilitar la generación de datos sintéticos con la creación de una herramienta que aprovecha los avances de los motores modernos de renderizado. En este contexto, los datos sintéticos generados pueden utilizarse para entrenar modelos para tareas como la estimación de la pose 6D de objetos o la estimación de agarres. La estimación de la pose 6D de objetos se refiere al problema de determinar la posición y orientación de un objeto en el espacio 3D, mientras que la estimación del agarre implica predecir la posición y orientación de una mano robótica o pinza que pueda utilizarse para coger y manipular el objeto. Se trata de tareas importantes en robótica y visión por computador, ya que permiten a los robots realizar tareas complejas de manipulación y agarre. En este trabajo proponemos una forma de extraer información de agarres a partir de interacciones mano-objeto en realidad virtual, de modo que los datos sintéticos también puedan impulsar la investigación en esa área. Por último, utilizamos estos datos generados sintéticamente para poner a prueba la propuesta de aplicar arquitecturas de estimación de pose 6D de objetos a la estimación de regiones de agarre. Esta propuesta se basa en que ambos problemas comparten varios conceptos subyacentes, como la detección y orientación de objetos. / This thesis has been funded by the Spanish Ministry of Education [FPU17/00166] 6D Object Pose estimation Synthetic Data Sim2real Grasping Virtual Reality / Estimación de pose 6D Datos Sintéticos Simulación a real Agarres Realidad Virtual
108	Take the Lead: Toward a Virtual Video Dance Partner Farris, Ty 01 August 2021 (has links) (PDF) My work focuses on taking a single person as input and predicting the intentional movement of one dance partner based on the other dance partner's movement. Human pose estimation has been applied to dance and computer vision, but many existing applications focus on a single individual or multiple individuals performing. Currently there are very few works that focus specifically on dance couples combined with pose prediction. This thesis is applicable to the entertainment and gaming industry by training people to dance with a virtual dance partner. Many existing interactive or virtual dance partners require a motion capture system, multiple cameras or a robot which creates an expensive cost. This thesis does not use a motion capture system and combines OpenPose with swing dance YouTube videos to create a virtual dance partner. By taking in the current dancer's moves as input, the system predicts the dance partner's corresponding moves in the video frames. In order to create a virtual dance partner, datasets that contain information about the skeleton keypoints are necessary to predict a dance partner's pose. There are existing dance datasets for a specific type of dance, but these datasets do not cover swing dance. Furthermore, the dance datasets that do include swing have a limited number of videos. The contribution of this thesis is a large swing dataset that contains three different types of swing dance: East Coast, Lindy Hop and West Coast. I also provide a basic framework to extend the work to create a real-time and interactive dance partner. Computer Vision Dance Interactive Dance Partner Swing Dance Human Pose Estimation Machine Learning Artificial Intelligence and Robotics Data Science
109	Reinforcement learning for robotic manipulation / Reinforcement learning för manipulering med robot Arnekvist, Isac January 2017 (has links) Reinforcement learning was recently successfully used for real-world robotic manipulation tasks, without the need for human demonstration, usinga normalized advantage function-algorithm (NAF). Limitations on the shape of the advantage function however poses doubts to what kind of policies can be learned using this method. For similar tasks, convolutional neural networks have been used for pose estimation from images taken with fixed position cameras. For some applications however, this might not be a valid assumption. It was also shown that the quality of policies for robotic tasks severely deteriorates from small camera offsets. This thesis investigates the use of NAF for a pushing task with clear multimodal properties. The results are compared with using a deterministic policy with minimal constraints on the Q-function surface. Methods for pose estimation using convolutional neural networks are further investigated, especially with regards to randomly placed cameras with unknown offsets. By defining the coordinate frame of objects with respect to some visible feature, it is hypothesized that relative pose estimation can be accomplished even when the camera is not fixed and the offset is unknown. NAF is successfully implemented to solve a simple reaching task on a real robotic system where data collection is distributed over several robots, and learning is done on a separate server. Using NAF to learn a pushing task fails to converge to a good policy, both on the real robots and in simulation. Deep deterministic policy gradient (DDPG) is instead used in simulation and successfully learns to solve the task. The learned policy is then applied on the real robots and accomplishes to solve the task in the real setting as well. Pose estimation from fixed position camera images is learned and the policy is still able to solve the task using these estimates. By defining a coordinate frame from an object visible to the camera, in this case the robot arm, a neural network learns to regress the pushable objects pose in this frame without the assumption of a fixed camera. However, the precision of the predictions were too inaccurate to be used for solving the pushing task. Further modifications to this approach could however show to be a feasible solution to randomly placed cameras with unknown poses. / Reinforcement learning har nyligen använts framgångsrikt för att lära icke-simulerade robotar uppgifter med hjälp av en normalized advantage function-algoritm (NAF), detta utan att använda mänskliga demonstrationer. Restriktioner på funktionsytorna som använts kan dock visa sig vara problematiska för generalisering till andra uppgifter. För poseestimering har i liknande sammanhang convolutional neural networks använts med bilder från kamera med konstant position. I vissa applikationer kan dock inte kameran garanteras hålla en konstant position och studier har visat att kvaliteten på policys kraftigt förvärras när kameran förflyttas. Denna uppsats undersöker användandet av NAF för att lära in en ”pushing”-uppgift med tydliga multimodala egenskaper. Resultaten jämförs med användandet av en deterministisk policy med minimala restriktioner på Q-funktionsytan. Vidare undersöks användandet av convolutional neural networks för pose-estimering, särskilt med hänsyn till slumpmässigt placerade kameror med okänd placering. Genom att definiera koordinatramen för objekt i förhållande till ett synligt referensobjekt så tros relativ pose-estimering kunna utföras även när kameran är rörlig och förflyttningen är okänd. NAF appliceras i denna uppsats framgångsrikt på enklare problem där datainsamling är distribuerad över flera robotar och inlärning sker på en central server. Vid applicering på ”pushing”- uppgiften misslyckas dock NAF, både vid träning på riktiga robotar och i simulering. Deep deterministic policy gradient (DDPG) appliceras istället på problemet och lär sig framgångsrikt att lösa problemet i simulering. Den inlärda policyn appliceras sedan framgångsrikt på riktiga robotar. Pose-estimering genom att använda en fast kamera implementeras också framgångsrikt. Genom att definiera ett koordinatsystem från ett föremål i bilden med känd position, i detta fall robotarmen, kan andra föremåls positioner beskrivas i denna koordinatram med hjälp av neurala nätverk. Dock så visar sig precisionen vara för låg för att appliceras på robotar. Resultaten visar ändå att denna metod, med ytterligare utökningar och modifikationer, skulle kunna lösa problemet.
110	Fast Recognition and Pose Estimation for the Purpose of Bin-Picking Robotics Lonsberry, Alexander J. January 2011 (has links) No description available. Robotics Object Recognition 3D 3-D pose estimation automatic correlation bin-picking robotics bin picking vision system

Search results