• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 121
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 182
  • 182
  • 94
  • 69
  • 45
  • 33
  • 33
  • 30
  • 28
  • 28
  • 27
  • 26
  • 25
  • 24
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Reconstructing 3D Humans From Visual Data

Zheng, Ce 01 January 2023 (has links) (PDF)
Understanding humans in visual content is fundamental for numerous computer vision applications. Extensive research has been conducted in the field of human pose estimation (HPE) to accurately locate joints and construct body representations from images and videos. Expanding on HPE, human mesh recovery (HMR) addresses the more complex task of estimating the 3D pose and shape of the entire human body. HPE and HMR have gained significant attention due to their applications in areas such as digital human avatar modeling, AI coaching, and virtual reality [135]. However, HPE and HMR come with notable challenges, including intricate body articulation, occlusion, depth ambiguity, and the limited availability of annotated 3D data. Despite the progress made so far, the research community continues to strive for robust, accurate, and efficient solutions in HPE and HMR, advancing us closer to the ultimate goals in the field. This dissertation tackles various challenges in the domains of HPE and HMR. The initial focus is on video-based HPE, where we proposed a transformer architecture named PoseFormer [136] to leverage to capture the spatial relationships between body joints and temporal correlations across frames. This approach effectively harnesses the comprehensive connectivity and expressive power of transformers, leading to improved pose estimation accuracy in video sequences. Building upon this, the dissertation addresses the heavy computational and memory burden associated with image-based HMR. Our proposed Feater Map-based Transformer method (FeatER [133]) and a Pooling attention transformer method (POTTER[130]), demonstrate superior performance while significantly reducing computational and memory requirements compared to existing state-of-the-art techniques. Furthermore, a diffusion-based framework (DiffMesh[134]) is proposed for reconstructing high-quality human mesh outputs given input video sequences. These achievements provide practical and efficient solutions that cater to the demands of real-world applications in HPE and HMR. In this dissertation, our contributions advance the fields of HPE and HMR, bringing us closer to accurate and efficient solutions for understanding humans in visual content.
72

Rigorous Model of Panoramic Cameras

Shin, Sung Woong 31 March 2003 (has links)
No description available.
73

Deep Learning for estimation of fingertip location in 3-dimensional point clouds : An investigation of deep learning models for estimating fingertips in a 3D point cloud and its predictive uncertainty

Hölscher, Phillip January 2021 (has links)
Sensor technology is rapidly developing and, consequently, the generation of point cloud data is constantly increasing. Since the recent release of PointNet, it is possible to process this unordered 3-dimensional data directly in a neural network. The company TLT Screen AB, which develops cutting-edge tracking technology, seeks to optimize the localization of the fingertips of a hand in a point cloud. To do so, the identification of relevant 3D neural network models for modeling hands and detection of fingertips in various hand orientations is essential. The Hand PointNet processes point clouds of hands directly and generate estimations of fixed points (joints), including fingertips, of the hands. Therefore, this model was selected to optimize the localization of fingertips for TLT Screen AB and forms the subject of this research. The model has advantages over conventional convolutional neural networks (CNN). First of all, in contrast to the 2D CNN, the Hand PointNet can use the full 3-dimensional spatial information. Compared to the 3D CNN, moreover, it avoids unnecessarily voluminous data and enables more efficient learning. The model was trained and evaluated on the public dataset MRSA Hand. In contrast to previously published work, the main object of this investigation is the estimation of only 5 joints, for the fingertips. The behavior of the model with a reduction from the usual 21 to 11 and only 5 joints are examined. It is found that the reduction of joints contributed to an increase in the mean error of the estimated joints. Furthermore, the examination of the distribution of the residuals of the estimate for fingertips is found to be less dense. MC dropout to study the prediction uncertainty for the fingertips has shown that the uncertainty increases when the joints are decreased. Finally, the results show that the uncertainty is greatest for the prediction of the thumb tip. Starting from the tip of the thumb, it is observed that the uncertainty of the estimates decreases with each additional fingertip.
74

Learned structural and temporal context for dynamic 3D pose optimization and tracking

Patel, Mahir 30 September 2022 (has links)
Accurate 3D tracking of animals from video recordings is critical for many behavioral studies. However, other than for humans, there is a lack of publicly available datasets of videos of animals that the computer vision community could use for model development. Furthermore, due to occlusion and the uncontrollable nature of the animals, existing pose estimation models suffer from inadequate precision. People rely on biomechanical expertise to design mathematical models to optimize poses to mitigate this issue at the cost of generalization. We propose OptiPose, a generalizable attention-based deep learning pose optimization model, as a part of a post-processing pipeline for refining 3D poses estimated by pre-existing systems. Our experiments show how OptiPose is highly robust to noise and occlusion and can be used to optimize pose sequences provided by state-of-the-art models for animal pose estimation. Furthermore, we will make Rodent3D, a multimodal (RGB, Thermal, and Depth) dataset for rats, publicly available.
75

3-D Face Modeling from a 2-D Image with Shape and Head Pose Estimation

Oyini Mbouna, Ralph January 2014 (has links)
This paper presents 3-D face modeling with head pose and depth information estimated from a 2-D query face image. Many recent approaches to 3-D face modeling are based on a 3-D morphable model that separately encodes the shape and texture in a parameterized model. The model parameters are often obtained by applying statistical analysis to a set of scanned 3-D faces. Such approaches tend to depend on the number and quality of scanned 3-D faces, which are difficult to obtain and computationally intensive. To overcome the limitations of 3-D morphable models, several modeling techniques from 2-D images have been proposed. We propose a novel framework for depth estimation from a single 2-D image with an arbitrary pose. The proposed scheme uses a set of facial features in a query face image and a reference 3-D face model to estimate the head pose angles of the face. The depth information of the subject at each feature point is represented by the depth information of the reference 3-D face model multiplied by a vector of scale factors. We use the positions of a set of facial feature points on the query 2-D image to deform the reference face dense model into a person specific 3-D face by minimizing an objective function. The objective function is defined as the feature disparity between the facial features in the face image and the corresponding 3-D facial features on the rotated reference model projected onto 2-D space. The pose and depth parameters are iteratively refined until stopping criteria are reached. The proposed method requires only a face image of arbitrary pose for the reconstruction of the corresponding 3-D face dense model with texture. Experiment results with USF Human-ID and Pointing'04 databases show that the proposed approach is effective to estimate depth and head pose information with a single 2-D image. / Electrical and Computer Engineering
76

Design of Viewpoint-Equivariant Networks to Improve Human Pose Estimation

Garau, Nicola 31 May 2022 (has links)
Human pose estimation (HPE) is an ever-growing research field, with an increasing number of publications in the computer vision and deep learning fields and it covers a multitude of practical scenarios, from sports to entertainment and from surveillance to medical applications. Despite the impressive results that can be obtained with HPE, there are still many problems that need to be tackled when dealing with real-world applications. Most of the issues are linked to a poor or completely wrong detection of the pose that emerges from the inability of the network to model the viewpoint. This thesis shows how designing viewpoint-equivariant neural networks can lead to substantial improvements in the field of human pose estimation, both in terms of state-of-the-art results and better real-world applications. By jointly learning how to build hierarchical human body poses together with the observer viewpoint, a network can learn to generalise its predictions when dealing with previously unseen viewpoints. As a result, the amount of training data needed can be drastically reduced, simultaneously leading to faster and more efficient training and more robust and interpretable real-world applications.
77

Gappy POD and Temporal Correspondence for Lizard Motion Estimation

Kurdila, Hannah Robertshaw 20 June 2018 (has links)
With the maturity of conventional industrial robots, there has been increasing interest in designing robots that emulate realistic animal motions. This discipline requires careful and systematic investigation of a wide range of animal motions from biped, to quadruped, and even to serpentine motion of centipedes, millipedes, and snakes. Collecting optical motion capture data of such complex animal motions can be complicated for several reasons. Often there is the need to use many high-quality cameras for detailed subject tracking, and self-occlusion, loss of focus, and contrast variations challenge any imaging experiment. The problem of self-occlusion is especially pronounced for animals. In this thesis, we walk through the process of collecting motion capture data of a running lizard. In our collected raw video footage, it is difficult to make temporal correspondences using interpolation methods because of prolonged blurriness, occlusion, or the limited field of vision of our cameras. To work around this, we first make a model data set by making our best guess of the points' locations through these corruptions. Then, we randomly eclipse the data, use Gappy POD to repair the data and then see how closely it resembles the initial set, culminating in a test case where we simulate the actual corruptions we see in the raw video footage. / Master of Science
78

Repousser les limites de l'identification faciale en contexte de vidéo-surveillance / Breaking the limits of facial identification in video-surveillance context.

Fiche, Cécile 31 January 2012 (has links)
Les systèmes d'identification de personnes basés sur le visage deviennent de plus en plus répandus et trouvent des applications très variées, en particulier dans le domaine de la vidéosurveillance. Or, dans ce contexte, les performances des algorithmes de reconnaissance faciale dépendent largement des conditions d'acquisition des images, en particulier lorsque la pose varie mais également parce que les méthodes d'acquisition elles mêmes peuvent introduire des artéfacts. On parle principalement ici de maladresse de mise au point pouvant entraîner du flou sur l'image ou bien d'erreurs liées à la compression et faisant apparaître des effets de blocs. Le travail réalisé au cours de la thèse porte donc sur la reconnaissance de visages à partir d'images acquises à l'aide de caméras de vidéosurveillance, présentant des artéfacts de flou ou de bloc ou bien des visages avec des poses variables. Nous proposons dans un premier temps une nouvelle approche permettant d'améliorer de façon significative la reconnaissance des visages avec un niveau de flou élevé ou présentant de forts effets de bloc. La méthode, à l'aide de métriques spécifiques, permet d'évaluer la qualité de l'image d'entrée et d'adapter en conséquence la base d'apprentissage des algorithmes de reconnaissance. Dans un second temps, nous nous sommes focalisés sur l'estimation de la pose du visage. En effet, il est généralement très difficile de reconnaître un visage lorsque celui-ci n'est pas de face et la plupart des algorithmes d'identification de visages considérés comme peu sensibles à ce paramètre nécessitent de connaître la pose pour atteindre un taux de reconnaissance intéressant en un temps relativement court. Nous avons donc développé une méthode d'estimation de la pose en nous basant sur des méthodes de reconnaissance récentes afin d'obtenir une estimation rapide et suffisante de ce paramètre. / The person identification systems based on face recognition are becoming increasingly widespread and are being used in very diverse applications, particularly in the field of video surveillance. In this context, the performance of the facial recognition algorithms largely depends on the image acquisition context, especially because the pose can vary, but also because the acquisition methods themselves can introduce artifacts. The main issues are focus imprecision, which can lead to blurred images, or the errors related to compression, which can introduce the block artifact. The work done during the thesis focuses on facial recognition in images taken by video surveillance cameras, in cases where the images contain blur or block artifacts or show various poses. First, we are proposing a new approach that allows to significantly improve facial recognition in images with high blur levels or with strong block artifacts. The method, which makes use of specific noreference metrics, starts with the evaluation of the quality level of the input image and then adapts the training database of the recognition algorithms accordingly. Second, we have focused on the facial pose estimation. Normally, it is very difficult to recognize a face in an image taken from another viewpoint than the frontal one and the majority of facial identification algorithms which are robust to pose variation need to know the pose in order to achieve a satisfying recognition rate in a relatively short time. We have therefore developed a fast and satisfying pose estimation method based on recent recognition techniques.
79

Real-Time Head Pose Estimation in Low-Resolution Football Footage / Realtidsestimering av huvudets vridning i lågupplösta videosekvenser från fotbollsmatcher

Launila, Andreas January 2009 (has links)
This report examines the problem of real-time head pose estimation in low-resolution football footage. A method is presented for inferring the head pose using a combination of footage and knowledge of the locations of the football and players. An ensemble of randomized ferns is compared with a support vector machine for processing the footage, while a support vector machine performs pattern recognition on the location data. Combining the two sources of information outperforms either in isolation. The location of the football turns out to be an important piece of information. / QC 20100707 / Capturing and Visualizing Large scale Human Action (ACTVIS)
80

Vision-Based Techniques for Cognitive and Motor Skill Assessments

Floyd, Beatrice K. 24 August 2012 (has links)
No description available.

Page generated in 0.1077 seconds