Global ETD Search

1	Estimation of Human Poses Categories and Physical Object Properties from Motion Trajectories Fathollahi Ghezelghieh, Mona 22 June 2017 (has links) Despite the impressive advancements in people detection and tracking, safety is still a key barrier to the deployment of autonomous vehicles in urban environments [1]. For example, in non-autonomous technology, there is an implicit communication between the people crossing the street and the driver to make sure they have communicated their intent to the driver. Therefore, it is crucial for the autonomous car to infer the future intent of the pedestrian quickly. We believe that human body orientation with respect to the camera can help the intelligent unit of the car to anticipate the future movement of the pedestrians. To further improve the safety of pedestrians, it is important to recognize whether they are distracted, carrying a baby, or pushing a shopping cart. Therefore, estimating the fine- grained 3D pose, i.e. (x,y,z)-coordinates of the body joints provides additional information for decision-making units of driverless cars. In this dissertation, we have proposed a deep learning-based solution to classify the categorized body orientation in still images. We have also proposed an efficient framework based on our body orientation classification scheme to estimate human 3D pose in monocular RGB images. Furthermore, we have utilized the dynamics of human motion to infer the body orientation in image sequences. To achieve this, we employ a recurrent neural network model to estimate continuous body orientation from the trajectories of body joints in the image plane. The proposed body orientation and 3D pose estimation framework are tested on the largest 3D pose estimation benchmark, Human3.6m (both in still images and video), and we have proved the efficacy of our approach by benchmarking it against the state-of-the-art approaches. Another critical feature of self-driving car is to avoid an obstacle. In the current prototypes the car either stops or changes its lane even if it causes other traffic disruptions. However, there are situations when it is preferable to collide with the object, for example a foam box, rather than take an action that could result in a much more serious accident than collision with the object. In this dissertation, for the first time, we have presented a novel method to discriminate between physical properties of these types of objects such as bounciness, elasticity, etc. based on their motion characteristics . The proposed algorithm is tested on synthetic data, and, as a proof of concept, its effectiveness on a limited set of real-world data is demonstrated. 3D Human Pose Camera Viewpoint Deep Neural Network Computer Sciences
2	Unsupervised 3D Human Pose Estimation / Oövervakad mänsklig poseuppskattning i 3D Budaraju, Sri Datta January 2021 (has links) The thesis proposes an unsupervised representation learning method to predict 3D human pose from a 2D skeleton via a VAEGAN (Variational Autoencoder Generative Adversarial Network) hybrid network. The method learns to lift poses from 2D to 3D using selfsupervision and adversarial learning techniques. The method does not use images, heatmaps, 3D pose annotations, paired/unpaired 2Dto3D skeletons, 3D priors, synthetic 2D skeletons, multiview or temporal information in any shape or form. The 2D skeleton input is taken by a VAE that encodes it in a latent space and then decodes that latent representation to a 3D pose. The 3D pose is then reprojected to 2D for a constrained, selfsupervised optimization using the input 2D pose. Parallelly, the 3D pose is also randomly rotated and reprojected to 2D to generate a ’novel’ 2D view for unconstrained adversarial optimization using a discriminator network. The combination of the optimizations of the original and the novel 2D views of the predicted 3D pose results in a ’realistic’ 3D pose generation. The thesis shows that the encoding and decoding process of the VAE addresses the major challenge of erroneous and incomplete skeletons from 2D detection networks as inputs and that the variance of the VAE can be altered to get various plausible 3D poses for a given 2D input. Additionally, the latent representation could be used for crossmodal training and many downstream applications. The results on Human3.6M datasets outperform previous unsupervised approaches with less model complexity while addressing more hurdles in scaling the task to the real world. / Uppsatsen föreslår en oövervakad metod för representationslärande för att förutsäga en 3Dpose från ett 2D skelett med hjälp av ett VAE GAN (Variationellt Autoenkodande Generativt Adversariellt Nätverk) hybrid neuralt nätverk. Metoden lär sig att utvidga poser från 2D till 3D genom att använda självövervakning och adversariella inlärningstekniker. Metoden använder sig vare sig av bilder, värmekartor, 3D poseannotationer, parade/oparade 2D till 3D skelett, a priori information i 3D, syntetiska 2Dskelett, flera vyer, eller tidsinformation. 2Dskelettindata tas från ett VAE som kodar det i en latent rymd och sedan avkodar den latenta representationen till en 3Dpose. 3D posen är sedan återprojicerad till 2D för att genomgå begränsad, självövervakad optimering med hjälp av den tvådimensionella posen. Parallellt roteras dessutom 3Dposen slumpmässigt och återprojiceras till 2D för att generera en ny 2D vy för obegränsad adversariell optimering med hjälp av ett diskriminatornätverk. Kombinationen av optimeringarna av den ursprungliga och den nya 2Dvyn av den förutsagda 3Dposen resulterar i en realistisk 3Dposegenerering. Resultaten i uppsatsen visar att kodningsoch avkodningsprocessen av VAE adresserar utmaningen med felaktiga och ofullständiga skelett från 2D detekteringsnätverk som indata och att variansen av VAE kan modifieras för att få flera troliga 3D poser för givna 2D indata. Dessutom kan den latenta representationen användas för crossmodal träning och flera nedströmsapplikationer. Resultaten på datamängder från Human3.6M är bättre än tidigare oövervakade metoder med mindre modellkomplexitet samtidigt som de adresserar flera hinder för att skala upp uppgiften till verkliga tillämpningar. Computer Vision Projective Geometry Deep Learning Unsupervised Learning 3D Human Pose Estimation GAN AutoEncoder Hybrid Generative Model Self Supervision Computer and Information Sciences Data- och informationsvetenskap
3	Monocular 3D Human Pose Estimation / Monokulär 3D-människans hållningsuppskattning Rey, Robert January 2023 (has links) The focus of this work is the task of 3D human pose estimation, more specifically by making use of key points located in single monocular images in order to estimate the location of human body joints in a 3D space. It was done in association with Tracab, a company based in Stockholm, who specialises in advanced sports tracking and analytics solutions. Tracab’s core product is their optical tracking system for football, which involves installing multiple highspeed cameras around the sports venue. One of the main benefits of this work will be to reduce the number of cameras required to create the 3D skeletons of the players, hence reducing production costs as well as making the whole process of creating the 3D skeletons much simpler in the future. The main problem we are tackling consists in going from a set of 2D joint locations and lifting them to a 3D space, which would add an information of depth to the joint locations. One problem with this task is the limited availability of in-thewild datasets with corresponding 3D ground truth labels. We hope to tackle this issue by making use of the restricted Human3.6m dataset along with the Tracab dataset in order to achieve adequate results. Since the Tracab dataset is very large, i.e millions of unique poses and skeletons, we have focused our experiments on a single football game. Although extensive research has been done in the field by using architectures such as convolutional neural networks, transformers, spatial-temporal architectures and more, we are tackling this issue by making use of a simple feedforward neural network developed by Martinez et al, this is mainly possible due to the abundance of data available at Tracab. / Fokus för detta arbete är att estimera 3D kroppspositioner, genom att använda detekterade punkter på människokroppen i enskilda monokulära bilder för att uppskatta 3D positionen av dessa ledpunkter. Detta arbete genomfördes i samarbete med Tracab, ett företag baserat i Stockholm, som specialiserar sig på avancerade lösningar för följning och analys inom idrott. Tracabs huvudprodukt är deras optiska följningssystem, som innebär att flera synkroniserade höghastighetskameror installeras runt arenan. En av de främsta fördelarna med detta arbete kommer att vara att minska antalet kameror som krävs för att skapa 3D-skelett av spelarna, vilket minskar produktionskostnaderna och förenklar hela processen för att skapa 3D-skelett i framtiden. Huvudproblemet vi angriper är att gå från en uppsättning 2D-ledpunkter och lyfta dem till 3D-utrymme. Ett problem är den begränsade tillgången till datamängder med 3D ground truth från realistiska miljöer. Vi angriper detta problem genom att använda den begränsade Human3.6m-datasetet tillsammans med Tracab-datasetet för att uppnå tillräckliga resultat. Eftersom Tracab-datamängden är mycket stor, med miljontals unika poser och skelett, .har vi begränsat våra experiment till en fotbollsmatch. Omfattande forskning har gjorts inom området med användning av arkitekturer som konvolutionella neurala nätverk, transformerare, rumsligttemporala arkitekturer med mera. Här använder vi ett enkelt framåtriktat neuralt nätverk utvecklat av Martinez et al, vilket är möjligt tack vare den stora mängden data som är tillgänglig hos Tracab. 3D Human Pose Estimation Monocular Images Deep Learning Artificial Neural Networks 3D Människokroppspositionsuppskattning Monokulära bilder Djupinlärning Konstgjorda neurala nätverk Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.066 seconds