• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 122
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 185
  • 185
  • 96
  • 70
  • 47
  • 35
  • 33
  • 31
  • 29
  • 28
  • 28
  • 27
  • 25
  • 24
  • 24
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Compact Representations and Multi-cue Integration for Robotics

Söderberg, Robert January 2005 (has links)
<p>This thesis presents methods useful in a bin picking application, such as detection and representation of local features, pose estimation and multi-cue integration.</p><p>The scene tensor is a representation of multiple line or edge segments and was first introduced by Nordberg in [30]. A method for estimating scene tensors from gray-scale images is presented. The method is based on orientation tensors, where the scene tensor can be estimated by correlations of the elements in the orientation tensor with a number of 1<em>D</em> filters. Mechanisms for analyzing the scene tensor are described and an algorithm for detecting interest points and estimating feature parameters is presented. It is shown that the algorithm works on a wide spectrum of images with good result.</p><p>Representations that are invariant with respect to a set of transformations are useful in many applications, such as pose estimation, tracking and wide baseline stereo. The scene tensor itself is not invariant and three different methods for implementing an invariant representation based on the scene tensor is presented. One is based on a non-linear transformation of the scene tensor and is invariant to perspective transformations. Two versions of a tensor doublet is presented, which is based on a geometry of two interest points and is invariant to translation, rotation and scaling. The tensor doublet is used in a framework for view centered pose estimation of 3<em>D</em> objects. It is shown that the pose estimation algorithm has good performance even though the object is occluded and has a different scale compared to the training situation.</p><p>An industrial implementation of a bin picking application have to cope with several different types of objects. All pose estimation algorithms use some kind of model and there is yet no model that can cope with all kinds of situations and objects. This thesis presents a method for integrating cues from several pose estimation algorithms for increasing the system stability. It is also shown that the same framework can also be used for increasing the accuracy of the system by using cues from several views of the object. An extensive test with several different objects, lighting conditions and backgrounds shows that multi-cue integration makes the system more robust and increases the accuracy.</p><p>Finally, a system for bin picking is presented, built from the previous parts of this thesis. An eye in hand setup is used with a standard industrial robot arm. It is shown that the system works for real bin-picking situations with a positioning error below 1 mm and an orientation error below 1<sup>o</sup> degree for most of the different situations.</p> / Report code: LiU-TEK-LIC-2005:15.
92

Acquiring 3D Full-body Motion from Noisy and Ambiguous Input

Lou, Hui 2012 May 1900 (has links)
Natural human motion is highly demanded and widely used in a variety of applications such as video games and virtual realities. However, acquisition of full-body motion remains challenging because the system must be capable of accurately capturing a wide variety of human actions and does not require a considerable amount of time and skill to assemble. For instance, commercial optical motion capture systems such as Vicon can capture human motion with high accuracy and resolution while they often require post-processing by experts, which is time-consuming and costly. Microsoft Kinect, despite its high popularity and wide applications, does not provide accurate reconstruction of complex movements when significant occlusions occur. This dissertation explores two different approaches that accurately reconstruct full-body human motion from noisy and ambiguous input data captured by commercial motion capture devices. The first approach automatically generates high-quality human motion from noisy data obtained from commercial optical motion capture systems, eliminating the need for post-processing. The second approach accurately captures a wide variety of human motion even under significant occlusions by using color/depth data captured by a single Kinect camera. The common theme that underlies two approaches is the use of prior knowledge embedded in pre-recorded motion capture database to reduce the reconstruction ambiguity caused by noisy and ambiguous input and constrain the solution to lie in the natural motion space. More specifically, the first approach constructs a series of spatial-temporal filter bases from pre-captured human motion data and employs them along with robust statistics techniques to filter noisy motion data corrupted by noise/outliers. The second approach formulates the problem in a Maximum a Posterior (MAP) framework and generates the most likely pose which explains the observations as well as consistent with the patterns embedded in the pre-recorded motion capture database. We demonstrate the effectiveness of our approaches through extensive numerical evaluations on synthetic data and comparisons against results created by commercial motion capture systems. The first approach can effectively denoise a wide variety of noisy motion data, including walking, running, jumping and swimming while the second approach is shown to be capable of accurately reconstructing a wider range of motions compared with Microsoft Kinect.
93

Statistical and geometric methods for visual tracking with occlusion handling and target reacquisition

Lee, Jehoon 17 January 2012 (has links)
Computer vision is the science that studies how machines understand scenes and automatically make decisions based on meaningful information extracted from an image or multi-dimensional data of the scene, like human vision. One common and well-studied field of computer vision is visual tracking. It is challenging and active research area in the computer vision community. Visual tracking is the task of continuously estimating the pose of an object of interest from the background in consecutive frames of an image sequence. It is a ubiquitous task and a fundamental technology of computer vision that provides low-level information used for high-level applications such as visual navigation, human-computer interaction, and surveillance system. The focus of the research in this thesis is visual tracking and its applications. More specifically, the object of this research is to design a reliable tracking algorithm for a deformable object that is robust to clutter and capable of occlusion handling and target reacquisition in realistic tracking scenarios by using statistical and geometric methods. To this end, the approaches developed in this thesis make extensive use of region-based active contours and particle filters in a variational framework. In addition, to deal with occlusions and target reacquisition problems, we exploit the benefits of coupling 2D and 3D information of an image and an object. In this thesis, first, we present an approach for tracking a moving object based on 3D range information in stereoscopic temporal imagery by combining particle filtering and geometric active contours. Range information is weighted by the proposed Gaussian weighting scheme to improve segmentation achieved by active contours. In addition, this work present an on-line shape learning method based on principal component analysis to reacquire track of an object in the event that it disappears from the field of view and reappears later. Second, we propose an approach to jointly track a rigid object in a 2D image sequence and to estimate its pose in 3D space. In this work, we take advantage of knowledge of a 3D model of an object and we employ particle filtering to generate and propagate the translation and rotation parameters in a decoupled manner. Moreover, to continuously track the object in the presence of occlusions, we propose an occlusion detection and handling scheme based on the control of the degree of dependence between predictions and measurements of the system. Third, we introduce the fast level-set based algorithm applicable to real-time applications. In this algorithm, a contour-based tracker is improved in terms of computational complexity and the tracker performs real-time curve evolution for detecting multiple windows. Lastly, we deal with rapid human motion in context of object segmentation and visual tracking. Specifically, we introduce a model-free and marker-less approach for human body tracking based on a dynamic color model and geometric information of a human body from a monocular video sequence. The contributions of this thesis are summarized as follows: 1. Reliable algorithm to track deformable objects in a sequence consisting of 3D range data by combining particle filtering and statistics-based active contour models. 2. Effective handling scheme based on object's 2D shape information for the challenging situations in which the tracked object is completely gone from the image domain during tracking. 3. Robust 2D-3D pose tracking algorithm using a 3D shape prior and particle filters on SE(3). 4. Occlusion handling scheme based on the degree of trust between predictions and measurements of the tracking system, which is controlled in an online fashion. 5. Fast level set based active contour models applicable to real-time object detection. 6. Model-free and marker-less approach for tracking of rapid human motion based on a dynamic color model and geometric information of a human body.
94

Visual Perception of Objects and their Parts in Artificial Systems

Schoeler, Markus 12 October 2015 (has links)
No description available.
95

Détection et estimation de pose d'instances d'objet rigide pour la manipulation robotisée / Detection and pose estimation of instances of a rigid object for robotic bin-picking

Brégier, Romain 11 June 2018 (has links)
La capacité à détecter des objets dans une scène et à estimer leur pose constitue un préalable essentiel à l'automatisation d'un grand nombre de tâches, qu'il s'agisse d'analyser automatiquement une situation, de proposer une expérience de réalité augmentée, ou encore de permettre à un robot d'interagir avec son environnement.Dans cette thèse, nous nous intéressons à cette problématique à travers le scénario du dévracage industriel, dans lequel il convient de détecter des instances d'un objet rigide au sein d'un vrac et d'estimer leur pose -- c'est-à-dire leur position et orientation -- à des fins de manipulation robotisée.Nous développons pour ce faire une méthode basée sur l'exploitation d'une image de profondeur, procédant par agrégation d'hypothèses générées par un ensemble d'estimateurs locaux au moyen d'une forêt de décision.La pose d'un objet rigide est usuellement modélisée sous forme d'une transformation rigide 6D dans la littérature. Cette représentation se révèle cependant inadéquate lorsqu'il s'agit de traiter des objets présentant des symétries, pourtant nombreux parmi les objets manufacturés.Afin de contourner ces difficultés, nous introduisons une formulation de la notion de pose compatible avec tout objet rigide physiquement admissible, et munissons l'espace des poses d'une distance quantifiant la longueur du plus petit déplacement entre deux poses. Ces notions fournissent un cadre théorique rigoureux à partir duquel nous développons des outils permettant de manipuler efficacement le concept de pose, et constituent le socle de notre approche du problème du dévracage.Les standards d'évaluation utilisés dans l'état de l'art souffrant de certaines limitations et n'étant que partiellement adaptés à notre contexte applicatif, nous proposons une méthodologie d'évaluation adaptée à des scènes présentant un nombre variable d'instances d'objet arbitraire, potentiellement occultées. Nous mettons celle-ci en œuvre sur des données synthétiques et réelles, et montrons la viabilité de la méthode proposée, compatible avec les problématiques de temps de cycle, de performance et de simplicité de mise en œuvre du dévracage industriel. / Visual object detection and estimation of their poses -- i.e. position and orientation for a rigid object -- is of utmost interest for automatic scene understanding.In this thesis, we address this topic through the bin-picking scenario, in which instances of a rigid object have to be automatically detected and localized in bulk, so as to be manipulated by a robot for various industrial tasks such as machine feeding, assembling, packing, etc.To this aim, we propose a novel method for object detection and pose estimation given an input depth image, based on the aggregation of local predictions through an Hough forest technique, that is suitable with industrial constraints of performance and ease of use.Overcoming limitations of existing approaches that assume objects not to have any proper symmetries, we develop a theoretical and practical framework enabling us to consider any physical rigid object, thanks to a novel definition of the notion of pose and an associated distance.This framework provides tools to deal with poses efficiently for operations such as pose averaging or neighborhood queries, and is based on rigorous mathematical developments.Evaluation benchmarks used in the literature are not very representative of our application scenario and suffer from some intrinsic limitations, therefore we formalize a methodology suited for scenes in which many object instances, partially occluded, in arbitrary poses may be considered. We apply this methodology on real and synthetic data, and demonstrate the soundness of our approach compared to the state of the art.
96

Stanovení pozice objektu / Detection of object position

Baáš, Filip January 2019 (has links)
Master’s thesis deals with object pose estimation using monocular camera. As an object is considered every rigid, shape fixed entity with strong edges, ideally textureless. Object position in this work is represented by transformation matrix, which describes object translation and rotation towards world coordinate system. First chapter is dedicated to explanation of theory of geometric transformations and intrinsic and extrinsic parameters of camera. This chapter also describes detection algorithm Chamfer Matching, which is used in this work. Second chapter describes all development tools used in this work. Third, fourth and fifth chapter are dedicated to practical realization of this works goal and achieved results. Last chapter describes created application, that realizes known object pose estimation in scene.
97

Evaluation of 3D motion capture data from a deep neural network combined with a biomechanical model

Rydén, Anna, Martinsson, Amanda January 2021 (has links)
Motion capture has in recent years grown in interest in many fields from both game industry to sport analysis. The need of reflective markers and expensive multi-camera systems limits the business since they are costly and time-consuming. One solution to this could be a deep neural network trained to extract 3D joint estimations from a 2D video captured with a smartphone. This master thesis project has investigated the accuracy of a trained convolutional neural network, MargiPose, that estimates 25 joint positions in 3D from a 2D video, against a gold standard, multi-camera Vicon-system. The project has also investigated if the data from the deep neural network can be connected to a biomechanical modelling software, AnyBody, for further analysis. The final intention of this project was to analyze how accurate such a combination could be in golf swing analysis. The accuracy of the deep neural network has been evaluated with three parameters: marker position, angular velocity and kinetic energy for different segments of the human body. MargiPose delivers results with high accuracy (Mean Per Joint Position Error (MPJPE) = 1.52 cm) for a simpler movement but for a more advanced motion such as a golf swing, MargiPose achieves less accuracy in marker distance (MPJPE = 3.47 cm). The mean difference in angular velocity shows that MargiPose has difficulties following segments that are occluded or has a greater motion, such as the wrists in a golf swing where they both move fast and are occluded by other body segments. The conclusion of this research is that it is possible to connect data from a trained CNN with a biomechanical modelling software. The accuracy of the network is highly dependent on the intention of the data. For the purpose of golf swing analysis, this could be a great and cost-effective solution which could enable motion analysis for professionals but also for interested beginners. MargiPose shows a high accuracy when evaluating simple movements. However, when using it with the intention of analyzing a golf swing in i biomechanical modelling software, the outcome might be beyond the bounds of reliable results.
98

Learning to Predict Dense Correspondences for 6D Pose Estimation

Brachmann, Eric 17 January 2018 (has links)
Object pose estimation is an important problem in computer vision with applications in robotics, augmented reality and many other areas. An established strategy for object pose estimation consists of, firstly, finding correspondences between the image and the object’s reference frame, and, secondly, estimating the pose from outlier-free correspondences using Random Sample Consensus (RANSAC). The first step, namely finding correspondences, is difficult because object appearance varies depending on perspective, lighting and many other factors. Traditionally, correspondences have been established using handcrafted methods like sparse feature pipelines. In this thesis, we introduce a dense correspondence representation for objects, called object coordinates, which can be learned. By learning object coordinates, our pose estimation pipeline adapts to various aspects of the task at hand. It works well for diverse object types, from small objects to entire rooms, varying object attributes, like textured or texture-less objects, and different input modalities, like RGB-D or RGB images. The concept of object coordinates allows us to easily model and exploit uncertainty as part of the pipeline such that even repeating structures or areas with little texture can contribute to a good solution. Although we can train object coordinate predictors independent of the full pipeline and achieve good results, training the pipeline in an end-to-end fashion is desirable. It enables the object coordinate predictor to adapt its output to the specificities of following steps in the pose estimation pipeline. Unfortunately, the RANSAC component of the pipeline is non-differentiable which prohibits end-to-end training. Adopting techniques from reinforcement learning, we introduce Differentiable Sample Consensus (DSAC), a formulation of RANSAC which allows us to train the pose estimation pipeline in an end-to-end fashion by minimizing the expectation of the final pose error.
99

Hypothesis Generation for Object Pose Estimation From local sampling to global reasoning

Michel, Frank 14 February 2019 (has links)
Pose estimation has been studied since the early days of computer vision. The task of object pose estimation is to determine the transformation that maps an object from it's inherent coordinate system into the camera-centric coordinate system. This transformation describes the translation of the object relative to the camera and the orientation of the object in three dimensional space. The knowledge of an object's pose is a key ingredient in many application scenarios like robotic grasping, augmented reality, autonomous navigation and surveillance. A general estimation pipeline consists of the following four steps: extraction of distinctive points, creation of a hypotheses pool, hypothesis verification and, finally, the hypotheses refinement. In this work, we focus on the hypothesis generation process. We show that it is beneficial to utilize geometric knowledge in this process. We address the problem of hypotheses generation of articulated objects. Instead of considering each object part individually we model the object as a kinematic chain. This enables us to use the inner-part relationships when sampling pose hypotheses. Thereby we only need K correspondences for objects consisting of K parts. We show that applying geometric knowledge about part relationships improves estimation accuracy under severe self-occlusion and low quality correspondence predictions. In an extension we employ global reasoning within the hypotheses generation process instead of sampling 6D pose hypotheses locally. We therefore formulate a Conditional-Random-Field operating on the image as a whole inferring those pixels that are consistent with the 6D pose. Within the CRF we use a strong geometric check that is able to assess the quality of correspondence pairs. We show that our global geometric check improves the accuracy of pose estimation under heavy occlusion.
100

Pose Estimation using Implicit Functions and Uncertainty in 3D

Blomstedt, Frida January 2023 (has links)
Human pose estimation in 3D is a large area within computer vision, with many application areas. A common approach is to first estimate the pose in 2D, resulting in a confidence heatmap, and then estimate the 3D pose using the most likely estimations in 2D. This may, however, cause problems in cases where pose estimates are more uncertain and the estimation of one point is far from the true position, for example when a limb is occluded. This thesis adapts the method Neural Radiance Fields (NeRF) to 2D confidence heatmaps in order to create an implicit representation of the uncertainty in 3D, thus attempting to make use of as much information in 2D as possible. The adapted method was evaluated on the Human3.6M dataset, and results show that this method outperforms a simple triangulation baseline, especially when the estimation in 2D is far from the true pose.

Page generated in 0.1669 seconds