• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 120
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 181
  • 181
  • 93
  • 68
  • 45
  • 33
  • 32
  • 30
  • 28
  • 28
  • 27
  • 26
  • 25
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Multi-person Pose Estimation in Soccer Videos with Convolutional Neural Networks

Skyttner, Axel January 2018 (has links)
Pose estimation is the problem of detecting poses of people in images, multiperson pose estimation is the problem of detecting poses of multiple persons in images. This thesis investigates multi-person pose estimation by applying the associative embedding method on images from soccer videos. Three models are compared, first a pre-trained model, second a fine-tuned model and third a model extended to handle image sequences. The pre-trained method performed well on soccer images and the fine-tuned model performed better then the pre-trained model. The image sequence model performed equally as the fine-tuned model but not better. This thesis concludes that the associative embedding model is a feasible option for pose estimation in soccer videos and should be further researched.
22

Height Estimation of a Blimp Unmanned Aerial Vehicle Using Inertial Measurement Unit and Infrared Camera

Villeneuve, Hubert January 2017 (has links)
Increasing demands in areas such as security, surveillance, search and rescue, and communication, has promoted the research and development of unmanned aerial vehicles (UAVs) as such technologies can replace manned flights in dangerous or unfavorable conditions. Lighter-than-air UAVs such as blimps can carry higher payloads and can stay longer in the air compared to typical heavier-than-air UAVs such as aeroplanes or quadrotors. One purpose of this thesis is to develop a sensor suite basis for estimating the position and orientation of a blimp UAV in development with respect to a reference point for safer landing procedures using minimal on-board sensors. While the existing low-cost sensor package, including inertial measurement unit (IMU) and Global Navigation System (GPS) module, could be sufficient to estimate the pose of the blimp to a certain extent, the GPS module is not as precise in the short term, especially for altitude. The proposed system combines GPS and inertial data with information from a grounded infrared (IR) camera. Image frames are processed to identify three IR LEDs located on the UAV and each LED coordinate is estimated using a Perspective-n-Point (PnP) algorithm. Then the results from the PnP algorithm are fused with the GPS, accelerometer and gyroscope measurements using an Extended Kalman Filter (EKF) to get a more accurate estimate of the position and the orientation. Tests were conducted on a simulated blimp using the experimental avionics.
23

VECTOR REPRESENTATION TO ENHANCE POSE ESTIMATION FROM RGB IMAGES

Zongcheng Chu (8791457) 03 May 2020 (has links)
Head pose estimation is an essential task to be solved in computer vision. Existing research for pose estimation based on RGB images mainly uses either Euler angles or quaternions to predict pose. Nevertheless, both Euler angle- and quaternion-based approaches encounter the problem of discontinuity when describing three-dimensional rotations. This issue makes learning visual pattern more difficult for the convolutional neural network(CNN) which, in turn, compromises the estimation performance. To solve this problem, we introduce TriNet, a novel method based on three vectors converted from three Euler angles(roll, pitch, yaw). The orthogonality of the three vectors enables us to implement a complementary multi-loss function, which effectively reduces the prediction error. Our method achieves state-of-the-art performance on the AFLW2000, AFW and BIWI datasets. We also extend our work to general object pose estimation and show results in the experiment part.
24

Generating 3D Scenes From Single RGB Images in Real-Time Using Neural Networks

Grundberg, Måns, Altintas, Viktor January 2021 (has links)
The ability to reconstruct 3D scenes of environments is of great interest in a number of fields such as autonomous driving, surveillance, and virtual reality. However, traditional methods often rely on multiple cameras or sensor-based depth measurements to accurately reconstruct 3D scenes. In this thesis we propose an alternative, deep learning-based approach to 3D scene reconstruction for objects of interest, using nothing but single RGB images. We evaluate our approach using the Deep Object Pose Estimation (DOPE) neural network for object detection and pose estimation, and the NVIDIA Deep learning Dataset Synthesizer for synthetic data generation. Using two unique objects, our results indicate that it is possible to reconstruct 3D scenes from single RGB images within a few centimeters of error margin.
25

Vehicle-pedestrian interaction using naturalistic driving video through tractography of relative positions and pedestrian pose estimation

Mueid, Rifat M. 11 April 2017 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Research on robust Pre-Collision Systems (PCS) requires new techniques that will allow a better understanding of the vehicle-pedestrian dynamic relationship, and which can predict pedestrian future movements. Our research analyzed videos from the Transportation Active Safety Institute (TASI) 110-Car naturalistic driving dataset to extract two dynamic pedestrian semantic features. The dataset consists of videos recorded with forward facing cameras from 110 cars over a year in all weather and illumination conditions. This research focuses on the potential-conflict situations where a collision may happen if no avoidance action is taken from driver or pedestrian. We have used 1000 such 15 seconds videos to find vehicle-pedestrian relative dynamic trajectories and pose of pedestrians. Adaptive structural local appearance model and particle filter methods have been implemented and modified to track the pedestrians more accurately. We have developed new algorithm to compute Focus of Expansion (FoE) automatically. Automatically detected FoE height data have a correlation of 0.98 with the carefully clicked human data. We have obtained correct tractography results for over 82% of the videos. For pose estimation, we have used flexible mixture model for capturing co-occurrence between pedestrian body segments. Based on existing single-frame human pose estimation model, we have introduced Kalman filtering and temporal movement reduction techniques to make stable stick-figure videos of the pedestrian dynamic motion. We were able to reduce frame to frame pixel offset by 86% compared to the single frame method. These tractographs and pose estimation data were used as features to train a neural network for classifying ‘potential conflict’ and ‘no potential conflict’ situations. The training of the network achieved 91.2% true label accuracy, and 8.8% false level accuracy. Finally, the trained network was used to assess the probability of collision over time for the 15 seconds videos which generates a spike when there is a ‘potential conflict’ situation. We have also tested our method with TASI mannequin crash data. With the crash data we were able to get a danger spike for 70% of the videos. The research enables new analysis on potential-conflict pedestrian cases with 2D tractography data and stick-figure pose representation of pedestrians, which provides significant insight on the vehicle-pedestrian dynamics that are critical for safe autonomous driving and transportation safety innovations.
26

Exploring the Feasibility of Machine Learning Techniques in Recognizing Complex Human Activities

Hu, Shengnan 01 January 2023 (has links) (PDF)
This dissertation introduces several technical innovations that improve the ability of machine learning models to recognize a wide range of complex human activities. As human sensor data becomes more abundant, the need to develop algorithms for understanding and interpreting complex human actions has become increasingly important. Our research focuses on three key areas: multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. To tackle the problem of monitoring coordinated team activities from spatio-temporal traces, we introduce a new framework that incorporates field of view data to predict team performance. Our framework uses Spatial Temporal Graph Convolutional Networks (ST-GCN) and recurrent neural network layers to capture and model the dynamic spatial relationships between agents. The second part of the dissertation addresses the problem of multi-person pose estimation (MPPE) from video data. Our proposed technique (Language Assisted Multi-person Pose estimation) leverages text representations from multimodal foundation models to learn a visual representation that is more robust to occlusion. By infusing semantic information into pose estimation, our approach enables precise estimations, even in cluttered scenes. The final part of the dissertation examines the problem of fusing multimodal physiological input from cardiovascular and gaze tracking sensors to exploit the complementary nature of these modalities. When dealing with multimodal features, uncovering the correlations between different modalities is as crucial as identifying effective unimodal features. This dissertation introduces a hybrid multimodal tensor fusion network that is effective at learning both unimodal and bimodal dynamics. The outcomes of this dissertation contribute to advancing the field of complex human activity recognition by addressing the challenges associated with multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. The proposed innovations have potential applications in various domains, including video surveillance, human-robot interaction, sports analysis, and healthcare monitoring. By developing intelligent systems capable of accurately recognizing complex human activities, this research paves the way for improved safety, efficiency, and decision-making in a wide range of real-world applications.
27

A Comparison of Two-Dimensional Pose Estimation Algorithms Based on Natural Features

Korte, Christopher M. 23 September 2011 (has links)
No description available.
28

3D Deep Learning for Object-Centric Geometric Perception

Li, Xiaolong 30 June 2022 (has links)
Object-centric geometric perception aims at extracting the geometric attributes of 3D objects. These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images. Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation. To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective. / Doctor of Philosophy / 3D sensors these days are widely equipped on various mobile devices like a depth camera on iPhone, or laser LiDAR sensors on an autonomous driving vehicle. These 3D sensing techniques could help us get accurate measurements of the 3D world. For the field of machine intel- ligence, we also want to build intelligent system and algorithm to learn useful information and understand the 3D world better. We human beings have the incredible ability to sense and understand this 3D world through our visual or tactile system. For example, humans could infer the geometry structure and arrangement of furniture in a room without seeing the full room, we are able to track an 3D object no matter how its appearance, shape and scale changes, we could also predict the future motion of multiple objects based on sequential observation and complex reasoning. Here my work designs various frameworks to learn such 3D information from geometric data represented by a lot of 3D points, which achieves fine-grained geometric understanding of individual objects, and we can help machine tell the target objects' geometry, states, and dynamics. The work in this dissertation serves as building blocks towards a better understanding of this dynamic world.
29

Real-Time Head Pose Estimation in Low-Resolution Football Footage / Realtidsestimering av huvudets vridning i lågupplösta videosekvenser från fotbollsmatcher

Launila, Andreas January 2009 (has links)
<p>This report examines the problem of real-time head pose estimation in low-resolution football footage. A method is presented for inferring the head pose using a combination of footage and knowledge of the locations of the football and players. An ensemble of randomized ferns is compared with a support vector machine for processing the footage, while a support vector machine performs pattern recognition on the location data. Combining the two sources of information outperforms either in isolation. The location of the football turns out to be an important piece of information.</p> / QC 20100707 / Capturing and Visualizing Large scale Human Action (ACTVIS)
30

3D Vision Geometry for Rolling Shutter Cameras / Géométrie pour la vision 3D avec des caméras Rolling Shutter

Lao, Yizhen 16 May 2019 (has links)
De nombreuses caméras CMOS modernes sont équipées de capteurs Rolling Shutter (RS). Ces caméras à bas coût et basse consommation permettent d’atteindre de très hautes fréquences d’acquisition. Dans ce mode d’acquisition, les lignes de pixels sont exposées séquentiellement du haut vers le bas de l'image. Par conséquent, les images capturées alors que la caméra et/ou la scène est en mouvement présentent des distorsions qui rendent les algorithmes classiques au mieux moins précis, au pire inutilisables en raison de singularités ou de configurations dégénérées. Le but de cette thèse est de revisiter la géométrie de la vision 3D avec des caméras RS en proposant des solutions pour chaque sous-tâche du pipe-line de Structure-from-Motion (SfM).Le chapitre II présente une nouvelle méthode de correction du RS en utilisant les droites. Contrairement aux méthodes existantes, qui sont itératives et font l’hypothèse dite Manhattan World (MW), notre solution est linéaire et n’impose aucune contrainte sur l’orientation des droites 3D. De plus, la méthode est intégrée dans un processus de type RANSAC permettant de distinguer les courbes qui sont des projections de segments droits de celles qui correspondent à de vraies courbes 3D. La méthode de correction est ainsi plus robuste et entièrement automatisée.Le chapitre III revient sur l'ajustement faisceaux ou bundle adjustment (BA). Nous proposons un nouvel algorithme basé sur une erreur de projection dans laquelle l’index de ligne des points projetés varie pendant l’optimisation afin de garder une cohérence géométrique contrairement aux méthodes existantes qui considère un index fixe (celui mesurés dans l’image). Nous montrons que cela permet de lever la dégénérescence dans le cas où les directions de scan des images sont trop proches (cas très communs avec des caméras embraquées sur un véhicule par exemple). Dans le chapitre VI nous étendons le concept d'homographie aux cas d’images RS en démontrant que la relation point-à-point entre deux images d’un nuage de points coplanaires pouvait s’exprimer sous la forme de 3 à 7 matrices de taille 3X3 en fonction du modèle de mouvement utilisé. Nous proposons une méthode linéaire pour le calcul de ces matrices. Ces dernières sont ensuite utilisées pour résoudre deux problèmes classiques en vision par ordinateur à savoir le calcul du mouvement relatif et le « mosaïcing » dans le cas RS.Dans le chapitre V nous traitons le problème de calcul de pose et de reconstruction multi-vues en établissant une analogie avec les méthodes utilisées pour les surfaces déformables telles que SfT (Structure-from-Template) et NRSfM (Non Rigid Structure-from-Motion). Nous montrons qu’une image RS d’une scène rigide en mouvement peut être interprétée comme une image Global Shutter (GS) d’une surface virtuellement déformée (par l’effet RS). La solution proposée pour estimer la pose et la structure 3D de la scène est ainsi composée de deux étapes. D’abord les déformations virtuelles sont d’abord calculées grâce à SfT ou NRSfM en assumant un modèle GS classique (relaxation du modèle RS). Ensuite, ces déformations sont réinterprétées comme étant le résultat du mouvement durant l’acquisition (réintroduction du modèle RS). L’approche proposée présente ainsi de meilleures propriétés de convergence que les approches existantes. / Many modern CMOS cameras are equipped with Rolling Shutter (RS) sensors which are considered as low cost, low consumption and fast cameras. In this acquisition mode, the pixel rows are exposed sequentially from the top to the bottom of the image. Therefore, images captured by moving RS cameras produce distortions (e.g. wobble and skew) which make the classic algorithms at best less precise, at worst unusable due to singularities or degeneracies. The goal of this thesis is to propose a general framework for modelling and solving structure from motion (SfM) with RS cameras. Our approach consists in addressing each sub-task of the SfM pipe-line (namely image correction, absolute and relative pose estimation and bundle adjustment) and proposing improvements.The first part of this manuscript presents a novel RS correction method which uses line features. Unlike existing methods, which uses iterative solutions and make Manhattan World (MW) assumption, our method R4C computes linearly the camera instantaneous-motion using few image features. Besides, the method was integrated into a RANSAC-like framework which enables us to detect curves that correspond to actual 3D straight lines and reject outlier curves making image correction more robust and fully automated.The second part revisits Bundle Adjustment (BA) for RS images. It deals with a limitation of existing RS bundle adjustment methods in case of close read-out directions among RS views which is a common configuration in many real-life applications. In contrast, we propose a novel camera-based RS projection algorithm and incorporate it into RSBA to calculate reprojection errors. We found out that this new algorithm makes SfM survive the degenerate configuration mentioned above.The third part proposes a new RS Homography matrix based on point correspondences from an RS pair. Linear solvers for the computation of this matrix are also presented. Specifically, a practical solver with 13 point correspondences is proposed. In addition, we present two essential applications in computer vision that use RS homography: plane-based RS relative pose estimation and RS image stitching. The last part of this thesis studies absolute camera pose problem (PnP) and SfM which handle RS effects by drawing analogies with non-rigid vision, namely Shape-from-Template (SfT) and Non-rigid SfM (NRSfM) respectively. Unlike all existing methods which perform 3D-2D registration after augmenting the Global Shutter (GS) projection model with the velocity parameters under various kinematic models, we propose to use local differential constraints. The proposed methods outperform stat-of-the-art and handles configurations that are critical for existing methods.

Page generated in 0.1407 seconds