Global ETD Search

21	Multi-person Pose Estimation in Soccer Videos with Convolutional Neural Networks Skyttner, Axel January 2018 (has links) Pose estimation is the problem of detecting poses of people in images, multiperson pose estimation is the problem of detecting poses of multiple persons in images. This thesis investigates multi-person pose estimation by applying the associative embedding method on images from soccer videos. Three models are compared, first a pre-trained model, second a fine-tuned model and third a model extended to handle image sequences. The pre-trained method performed well on soccer images and the fine-tuned model performed better then the pre-trained model. The image sequence model performed equally as the fine-tuned model but not better. This thesis concludes that the associative embedding model is a feasible option for pose estimation in soccer videos and should be further researched. Deep Learning Pose Estimation Sport Analysis Engineering and Technology Teknik och teknologier
22	Height Estimation of a Blimp Unmanned Aerial Vehicle Using Inertial Measurement Unit and Infrared Camera Villeneuve, Hubert January 2017 (has links) Increasing demands in areas such as security, surveillance, search and rescue, and communication, has promoted the research and development of unmanned aerial vehicles (UAVs) as such technologies can replace manned flights in dangerous or unfavorable conditions. Lighter-than-air UAVs such as blimps can carry higher payloads and can stay longer in the air compared to typical heavier-than-air UAVs such as aeroplanes or quadrotors. One purpose of this thesis is to develop a sensor suite basis for estimating the position and orientation of a blimp UAV in development with respect to a reference point for safer landing procedures using minimal on-board sensors. While the existing low-cost sensor package, including inertial measurement unit (IMU) and Global Navigation System (GPS) module, could be sufficient to estimate the pose of the blimp to a certain extent, the GPS module is not as precise in the short term, especially for altitude. The proposed system combines GPS and inertial data with information from a grounded infrared (IR) camera. Image frames are processed to identify three IR LEDs located on the UAV and each LED coordinate is estimated using a Perspective-n-Point (PnP) algorithm. Then the results from the PnP algorithm are fused with the GPS, accelerometer and gyroscope measurements using an Extended Kalman Filter (EKF) to get a more accurate estimate of the position and the orientation. Tests were conducted on a simulated blimp using the experimental avionics. Blimp UAV IR camera IMU GPS Pose estimation EKF
23	VECTOR REPRESENTATION TO ENHANCE POSE ESTIMATION FROM RGB IMAGES Zongcheng Chu (8791457) 03 May 2020 (has links) Head pose estimation is an essential task to be solved in computer vision. Existing research for pose estimation based on RGB images mainly uses either Euler angles or quaternions to predict pose. Nevertheless, both Euler angle- and quaternion-based approaches encounter the problem of discontinuity when describing three-dimensional rotations. This issue makes learning visual pattern more difﬁcult for the convolutional neural network(CNN) which, in turn, compromises the estimation performance. To solve this problem, we introduce TriNet, a novel method based on three vectors converted from three Euler angles(roll, pitch, yaw). The orthogonality of the three vectors enables us to implement a complementary multi-loss function, which effectively reduces the prediction error. Our method achieves state-of-the-art performance on the AFLW2000, AFW and BIWI datasets. We also extend our work to general object pose estimation and show results in the experiment part. Computer Graphics Computer Vision pose estimation Deep learning Vectors
24	Generating 3D Scenes From Single RGB Images in Real-Time Using Neural Networks Grundberg, Måns, Altintas, Viktor January 2021 (has links) The ability to reconstruct 3D scenes of environments is of great interest in a number of fields such as autonomous driving, surveillance, and virtual reality. However, traditional methods often rely on multiple cameras or sensor-based depth measurements to accurately reconstruct 3D scenes. In this thesis we propose an alternative, deep learning-based approach to 3D scene reconstruction for objects of interest, using nothing but single RGB images. We evaluate our approach using the Deep Object Pose Estimation (DOPE) neural network for object detection and pose estimation, and the NVIDIA Deep learning Dataset Synthesizer for synthetic data generation. Using two unique objects, our results indicate that it is possible to reconstruct 3D scenes from single RGB images within a few centimeters of error margin. Scene reconstruction 3D reconstruction Pose Estimation Computer Sciences Datavetenskap (datalogi)
25	Vehicle-pedestrian interaction using naturalistic driving video through tractography of relative positions and pedestrian pose estimation Mueid, Rifat M. 11 April 2017 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Research on robust Pre-Collision Systems (PCS) requires new techniques that will allow a better understanding of the vehicle-pedestrian dynamic relationship, and which can predict pedestrian future movements. Our research analyzed videos from the Transportation Active Safety Institute (TASI) 110-Car naturalistic driving dataset to extract two dynamic pedestrian semantic features. The dataset consists of videos recorded with forward facing cameras from 110 cars over a year in all weather and illumination conditions. This research focuses on the potential-conflict situations where a collision may happen if no avoidance action is taken from driver or pedestrian. We have used 1000 such 15 seconds videos to find vehicle-pedestrian relative dynamic trajectories and pose of pedestrians. Adaptive structural local appearance model and particle filter methods have been implemented and modified to track the pedestrians more accurately. We have developed new algorithm to compute Focus of Expansion (FoE) automatically. Automatically detected FoE height data have a correlation of 0.98 with the carefully clicked human data. We have obtained correct tractography results for over 82% of the videos. For pose estimation, we have used flexible mixture model for capturing co-occurrence between pedestrian body segments. Based on existing single-frame human pose estimation model, we have introduced Kalman filtering and temporal movement reduction techniques to make stable stick-figure videos of the pedestrian dynamic motion. We were able to reduce frame to frame pixel offset by 86% compared to the single frame method. These tractographs and pose estimation data were used as features to train a neural network for classifying ‘potential conflict’ and ‘no potential conflict’ situations. The training of the network achieved 91.2% true label accuracy, and 8.8% false level accuracy. Finally, the trained network was used to assess the probability of collision over time for the 15 seconds videos which generates a spike when there is a ‘potential conflict’ situation. We have also tested our method with TASI mannequin crash data. With the crash data we were able to get a danger spike for 70% of the videos. The research enables new analysis on potential-conflict pedestrian cases with 2D tractography data and stick-figure pose representation of pedestrians, which provides significant insight on the vehicle-pedestrian dynamics that are critical for safe autonomous driving and transportation safety innovations. Intelligent Vehicle Pre-collision System Pose Estimation Pedestrian Safety
26	Exploring the Feasibility of Machine Learning Techniques in Recognizing Complex Human Activities Hu, Shengnan 01 January 2023 (has links) (PDF) This dissertation introduces several technical innovations that improve the ability of machine learning models to recognize a wide range of complex human activities. As human sensor data becomes more abundant, the need to develop algorithms for understanding and interpreting complex human actions has become increasingly important. Our research focuses on three key areas: multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. To tackle the problem of monitoring coordinated team activities from spatio-temporal traces, we introduce a new framework that incorporates field of view data to predict team performance. Our framework uses Spatial Temporal Graph Convolutional Networks (ST-GCN) and recurrent neural network layers to capture and model the dynamic spatial relationships between agents. The second part of the dissertation addresses the problem of multi-person pose estimation (MPPE) from video data. Our proposed technique (Language Assisted Multi-person Pose estimation) leverages text representations from multimodal foundation models to learn a visual representation that is more robust to occlusion. By infusing semantic information into pose estimation, our approach enables precise estimations, even in cluttered scenes. The final part of the dissertation examines the problem of fusing multimodal physiological input from cardiovascular and gaze tracking sensors to exploit the complementary nature of these modalities. When dealing with multimodal features, uncovering the correlations between different modalities is as crucial as identifying effective unimodal features. This dissertation introduces a hybrid multimodal tensor fusion network that is effective at learning both unimodal and bimodal dynamics. The outcomes of this dissertation contribute to advancing the field of complex human activity recognition by addressing the challenges associated with multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. The proposed innovations have potential applications in various domains, including video surveillance, human-robot interaction, sports analysis, and healthcare monitoring. By developing intelligent systems capable of accurately recognizing complex human activities, this research paves the way for improved safety, efficiency, and decision-making in a wide range of real-world applications. Machine Learning Human Activities Pose Estimation Computer Sciences
27	A Comparison of Two-Dimensional Pose Estimation Algorithms Based on Natural Features Korte, Christopher M. 23 September 2011 (has links) No description available. Aerospace Materials Pose-Estimation Image Processing Model Matching
28	3D Deep Learning for Object-Centric Geometric Perception Li, Xiaolong 30 June 2022 (has links) Object-centric geometric perception aims at extracting the geometric attributes of 3D objects. These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images. Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation. To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective. / Doctor of Philosophy / 3D sensors these days are widely equipped on various mobile devices like a depth camera on iPhone, or laser LiDAR sensors on an autonomous driving vehicle. These 3D sensing techniques could help us get accurate measurements of the 3D world. For the field of machine intel- ligence, we also want to build intelligent system and algorithm to learn useful information and understand the 3D world better. We human beings have the incredible ability to sense and understand this 3D world through our visual or tactile system. For example, humans could infer the geometry structure and arrangement of furniture in a room without seeing the full room, we are able to track an 3D object no matter how its appearance, shape and scale changes, we could also predict the future motion of multiple objects based on sequential observation and complex reasoning. Here my work designs various frameworks to learn such 3D information from geometric data represented by a lot of 3D points, which achieves fine-grained geometric understanding of individual objects, and we can help machine tell the target objects' geometry, states, and dynamics. The work in this dissertation serves as building blocks towards a better understanding of this dynamic world. point cloud pose estimation equivariance motion forecasting shape completion
29	Enhancing Online Yoga Instruction: Evaluating the Effectiveness of Visual Augmentations for Performance Assessment Gopal, Ajit Ayyadurai 23 October 2024 (has links) Yoga is a mind-body practice known for its substantial psychological and physiological benefit, contributing to a healthy lifestyle. However, without professional guidance, individuals may experience reduced performance and increased risk of injury. While online yoga classes on platforms like Zoom have grown in popularity, tools to support instructors in accurately assessing and monitoring student performance remain insufficient. For certain populations, this lack of real-time professional guidance poses safety risks and limits the effectiveness of the practice. This study examined the effectiveness of using computer-vision-based visual augmentations in enhancing instructors' ability to assess student performance and ensure safety. Specifically, we investigated the effectiveness of various visual augmentations in aiding instructors' visual search for unstable or unsafe poses. Eleven certified yoga instructors (8 female, 3 male), each holding 200 to 500 RYT certifications, participated in the study. Instructors completed eight trials assessing 12 yoga poses using four different visual augmentations—Raw Video, Skeleton (joint locations overlay), Contour (participant outlines), and Contour + Skeleton—across two camera views (Single vs. Multiple Views). During each trial, eye-tracking data was collected as instructors identified potentially unstable (unsafe) poses, and they subsequently completed a usability questionnaire and NASA - TLX rating. Upon finishing all trials, instructors provided overall feedback on the usability of the visual augmentations and camera views Instructors showed no significant difference in their assessment performance across different visual augmentations and camera views. The Skeleton augmentation led to increased cognitive workload, as indicated by larger pupil diameters. The Contour alone augmentation was less effective for visual search based on the usability ratings, and combining Contour with Skeleton did not offer notable improvements. Simpler visualizations, such as Raw and Skeleton, received higher usability ratings, and instructors preferred Single View layouts over Multiple Views for their ease of use and lower cognitive demand. In conclusion, while Skeleton augmentation increased cognitive load, it did not significantly enhance visual search performance. Future research should explore alternative visual augmentation techniques and configurations to better assist instructors on performance assessment which increases overall performance while not substantially increasing cognitive workload. / Master of Science / Yoga is a great way to improve both mental and physical health. However, practicing yoga without proper guidance can sometimes lead to injuries or mistakes. With more people attending yoga classes online, like through Zoom, it's harder for instructors to closely monitor how their students are performing, which can reduce the safety and benefits of the practice. This study looked at whether certain computer tools could help instructors better see and correct their students' poses during online yoga classes. Eleven experienced yoga instructors tried out different visual aids while watching students perform yoga poses. These aids included a simple video, a video with lines showing where the students' joints were (called Skeleton), a video that showed just the outline of the student (Contour), and a mix of both (Contour + Skeleton). The instructors were asked to identify any unstable or unsafe poses while using these aids. The results showed that none of the visual aids helped the instructors spot mistakes better than regular video. While the Skeleton aid made the instructors work harder mentally, it didn't actually help them perform better. The instructors preferred using simple video over the more complex tools and found that using a single camera view was easier to work with. In short, more complex visual tools didn't help instructors improve their performance. Future studies should explore other ways, like using different camera angles or adding sound, to help instructors in online yoga classes. Online yoga computer vision pose estimation yoga instruction
30	Real-Time Head Pose Estimation in Low-Resolution Football Footage / Realtidsestimering av huvudets vridning i lågupplösta videosekvenser från fotbollsmatcher Launila, Andreas January 2009 (has links) <p>This report examines the problem of real-time head pose estimation in low-resolution football footage. A method is presented for inferring the head pose using a combination of footage and knowledge of the locations of the football and players. An ensemble of randomized ferns is compared with a support vector machine for processing the footage, while a support vector machine performs pattern recognition on the location data. Combining the two sources of information outperforms either in isolation. The location of the football turns out to be an important piece of information.</p> / QC 20100707 / Capturing and Visualizing Large scale Human Action (ACTVIS) head pose estimation football real-time coarse head pose estimation machine learning computer vision svm randomized ferns Computer science Datalogi Image analysis Bildanalys Computer engineering Datorteknik

Search results