Global ETD Search

1	A Human Kinetic Dataset and a Hybrid Model for 3D Human Pose Estimation Wang, Jianquan 12 November 2020 (has links) Human pose estimation represents the skeleton of a person in color or depth images to improve a machine’s understanding of human movement. 3D human pose estimation uses a three-dimensional skeleton to represent the human body posture, which is more stereoscopic than a two-dimensional skeleton. Therefore, 3D human pose estimation can enable machines to play a role in physical education and health recovery, reducing labor costs and the risk of disease transmission. However, the existing datasets for 3D pose estimation do not involve fast motions that would cause optical blur for a monocular camera but would allow the subjects’ limbs to move in a more extensive range of angles. The existing models cannot guarantee both real-time performance and high accuracy, which are essential in physical education and health recovery applications. To improve real-time performance, researchers have tried to minimize the size of the model and have studied more efficient deployment methods. To improve accuracy, researchers have tried to use heat maps or point clouds to represent features, but this increases the difficulty of model deployment. To address the lack of datasets that include fast movements and easy-to-deploy models, we present a human kinetic dataset called the Kivi dataset and a hybrid model that combines the benefits of a heat map-based model and an end-to-end model for 3D human pose estimation. We describe the process of data collection and cleaning in this thesis. Our proposed Kivi dataset contains large-scale movements of humans. In the dataset, 18 joint points represent the human skeleton. We collected data from 12 people, and each person performed 38 sets of actions. Therefore, each frame of data has a corresponding person and action label. We design a preliminary model and propose an improved model to infer 3D human poses in real time. When validating our method on the Invariant Top-View (ITOP) dataset, we found that compared with the initial model, our improved model improves the mAP@10cm by 29%. When testing on the Kivi dataset, our improved model improves the mAP@10cm by 15.74% compared to the preliminary model. Our improved model can reach 65.89 frames per second (FPS) on the TensorRT platform. Human pose estimation Kinetic dataset
2	Estimation of Human Poses Categories and Physical Object Properties from Motion Trajectories Fathollahi Ghezelghieh, Mona 22 June 2017 (has links) Despite the impressive advancements in people detection and tracking, safety is still a key barrier to the deployment of autonomous vehicles in urban environments [1]. For example, in non-autonomous technology, there is an implicit communication between the people crossing the street and the driver to make sure they have communicated their intent to the driver. Therefore, it is crucial for the autonomous car to infer the future intent of the pedestrian quickly. We believe that human body orientation with respect to the camera can help the intelligent unit of the car to anticipate the future movement of the pedestrians. To further improve the safety of pedestrians, it is important to recognize whether they are distracted, carrying a baby, or pushing a shopping cart. Therefore, estimating the fine- grained 3D pose, i.e. (x,y,z)-coordinates of the body joints provides additional information for decision-making units of driverless cars. In this dissertation, we have proposed a deep learning-based solution to classify the categorized body orientation in still images. We have also proposed an efficient framework based on our body orientation classification scheme to estimate human 3D pose in monocular RGB images. Furthermore, we have utilized the dynamics of human motion to infer the body orientation in image sequences. To achieve this, we employ a recurrent neural network model to estimate continuous body orientation from the trajectories of body joints in the image plane. The proposed body orientation and 3D pose estimation framework are tested on the largest 3D pose estimation benchmark, Human3.6m (both in still images and video), and we have proved the efficacy of our approach by benchmarking it against the state-of-the-art approaches. Another critical feature of self-driving car is to avoid an obstacle. In the current prototypes the car either stops or changes its lane even if it causes other traffic disruptions. However, there are situations when it is preferable to collide with the object, for example a foam box, rather than take an action that could result in a much more serious accident than collision with the object. In this dissertation, for the first time, we have presented a novel method to discriminate between physical properties of these types of objects such as bounciness, elasticity, etc. based on their motion characteristics . The proposed algorithm is tested on synthetic data, and, as a proof of concept, its effectiveness on a limited set of real-world data is demonstrated. 3D Human Pose Camera Viewpoint Deep Neural Network Computer Sciences
3	Using Pitch Tipping for Baseball Pitch Prediction Ishii, Brian 01 June 2021 (has links) (PDF) Data Analytics and technology have changed baseball as we know it. From the increase in defensive shifts to teams using cameras in the outfield to steal signs, teams will try anything to win. One way to gain an edge in baseball is to figure out what pitches a pitcher will pitch. Pitch prediction is a popular task to try to accomplish with all the data that baseball provides. Most methods involve using situational data like the ball and strike count. In this paper, we try a different method of predicting pitch type by only looking at the pitcher's pose in the set position. We do this to find a pitcher's tell or "tip". In baseball, if a pitcher is tipping their pitches, they are doing something that gives away what they will pitch. This could be because the pitcher changes the grip on the ball only for some pitches or something as small as a different flex in their wrist. Professional baseball players will study pitchers before they pitch the ball to try to pick up on these tips. If a tip is found, the batters have a significant advantage over the pitcher. Our paper uses pose estimation and object detection to predict the pitch type based on the pitcher's set position before throwing the ball. Given a successful model, we can extract the important features or the potential tip from the data. Then, we can try to predict the pitches ourselves like a batter. We tested this method on three pitchers: Tyler Glasnow, Yu Darvish, and Stephen Strasburg. Our results demonstrate that when we predict pitch type at a 70\% accuracy, we can reasonably extract useful features. However, finding a useful tip from these features still requires manual observation. Baseball Pitch Prediction Human Pose Application Other Computer Engineering
4	Human-Robot Interaction with Pose Estimation and Dual-Arm Manipulation Using Artificial Intelligence Ren, Hailin 16 April 2020 (has links) This dissertation focuses on applying artificial intelligence techniques to human-robot interaction, which involves human pose estimation and dual-arm robotic manipulation. The motivating application behind this work is autonomous victim extraction in disaster scenarios using a conceptual design of a Semi-Autonomous Victim Extraction Robot (SAVER). SAVER is equipped with an advanced sensing system and two powerful robotic manipulators as well as a head and neck stabilization system to achieve autonomous safe and effective victim extraction, thereby reducing the potential risk to field medical providers. This dissertation formulates the autonomous victim extraction process using a dual-arm robotic manipulation system for human-robot interaction. According to the general process of Human-Robot Interaction (HRI), which includes perception, control, and decision-making, this research applies machine learning techniques to human pose estimation, robotic manipulator modeling, and dual-arm robotic manipulation, respectively. In the human pose estimation, an efficient parallel ensemble-based neural network is developed to provide real-time human pose estimation on 2D RGB images. A 13-limb, 14-joint skeleton model is used in this perception neural network and each ensemble of the neural network is designed for a specific limb detection. The parallel structure poses two main benefits: (1) parallel ensembles architecture and multiple Graphics Processing Units (GPU) make distributed computation possible, and (2) each individual ensemble can be deployed independently, making the processing more efficient when the detection of only some specific limbs is needed for the tasks. Precise robotic manipulator modeling benefits from the simplicity of the controller design and improves the performance of trajectory following. Traditional system modeling relies on first principles, simplifying assumptions and prior knowledge. Any imperfection in the above could lead to an analytical model that is different from the real system. Machine learning techniques have been applied in this field to pursue faster computation and more accurate estimation. However, a large dataset is always needed for these techniques, while obtaining the data from the real system could be costly in terms of both time and maintenance. In this research, a series of different Generative Adversarial Networks (GANs) are proposed to efficiently identify inverse kinematics and inverse dynamics of the robotic manipulators. One four-Degree-of-Freedom (DOF) robotic manipulator and one six-DOF robotic manipulator are used with different sizes of the dataset to evaluate the performance of the proposed GANs. The general methods can also be adapted to other systems, whose dataset is limited using general machine learning techniques. In dual-arm robotic manipulation, basic behaviors such as reaching, pushing objects, and picking objects up are learned using Reinforcement Learning. A Teacher-Student advising framework is proposed to learn a single neural network to control dual-arm robotic manipulators with previous knowledge of controlling a single robotic manipulator. Simulation and experimental results present the efficiency of the proposed framework compared to the learning process from scratch. Another concern in robotic manipulation is safety constraints. A variable-reward hierarchical reinforcement learning framework is proposed to solve sparse reward and tasks with constraints. A task of picking up and placing two objects to target positions while keeping them in a fixed distance within a threshold is used to evaluate the performance of the proposed method. Comparisons to other state-of-the-art methods are also presented. Finally, all the three proposed components are integrated as a single system. Experimental evaluation with a full-size manikin was performed to validate the concept of applying artificial intelligence techniques to autonomous victim extraction using a dual-arm robotic manipulation system. / Doctor of Philosophy / Using mobile robots for autonomous victim extraction in disaster scenarios reduces the potential risk to field medical providers. This dissertation focuses on applying artificial intelligence techniques to this human-robot interaction task involving pose estimation and dual-arm manipulation for victim extraction. This work is based on a design of a Semi-Autonomous Victim Extraction Robot (SAVER). SAVER is equipped with an advanced sensing system and two powerful robotic manipulators as well as a head and neck stabilization system attached on an embedded declining stretcher to achieve autonomous safe and effective victim extraction. Therefore, the overall research in this dissertation addresses: human pose estimation, robotic manipulator modeling, and dual-arm robotic manipulation for human pose adjustment. To accurately estimate the human pose for real-time applications, the dissertation proposes a neural network that could take advantages of multiple Graphics Processing Units (GPU). Considering the cost in data collection, the dissertation proposed novel machine learning techniques to obtain the inverse dynamic model and the inverse kinematic model of the robotic manipulators using limited collected data. Applying safety constraints is another requirement when robots interacts with humans. This dissertation proposes reinforcement learning techniques to efficiently train a dual-arm manipulation system not only to perform the basic behaviors, such as reaching, pushing objects and picking up and placing objects, but also to take safety constraints into consideration in performing tasks. Finally, the three components mentioned above are integrated together as a complete system. Experimental validation and results are discussed at the end of this dissertation. Reinforcement Learning Deep learning (Machine learning) Human-Robot Interaction Human Pose Estimation Human Pose Manipulation Dual-arm Manipulation
5	Single View Human Pose Tracking Li, Zhenning January 2013 (has links) Recovery of human pose from videos has become a highly active research area in the last decade because of many attractive potential applications, such as surveillance, non-intrusive motion analysis and natural human machine interaction. Video based full body pose estimation is a very challenging task, because of the high degree of articulation of the human body, the large variety of possible human motions, and the diversity of human appearances. Methods for tackling this problem can be roughly categorized as either discriminative or generative. Discriminative methods can work on single images, and are able to recover the human poses efficiently. However, the accuracy and generality largely depend on the training data. Generative approaches usually formulate the problem as a tracking problem and adopt an explicit human model. Although arbitrary motions can be tracked, such systems usually have difficulties in adapting to different subjects and in dealing with tracking failures. In this thesis, an accurate, efficient and robust human pose tracking system from a single view camera is developed, mainly following a generative approach. A novel discriminative feature is also proposed and integrated into the tracking framework to improve the tracking performance. The human pose tracking system is proposed within a particle filtering framework. A reconfigurable skeleton model is constructed based on the Acclaim Skeleton File convention. A basic particle filter is first implemented for upper body tracking, which fuses time efficient cues from monocular sequences and achieves real-time tracking for constrained motions. Next, a 3D surface model is added to the skeleton model, and a full body tracking system is developed for more general and complex motions, assuming a stereo camera input. Partitioned sampling is adopted to deal with the high dimensionality problem, and the system is capable of running in near real-time. Multiple visual cues are investigated and compared, including a newly developed explicit depth cue. Based on the comparative analysis of cues, which reveals the importance of depth and good bottom-up features, a novel algorithm for detecting and identifying endpoint body parts from depth images is proposed. Inspired by the shape context concept, this thesis proposes a novel Local Shape Context (LSC) descriptor specifically for describing the shape features of body parts in depth images. This descriptor describes the local shape of different body parts with respect to a given reference point on a human silhouette, and is shown to be effective at detecting and classifying endpoint body parts. A new type of interest point is defined based on the LSC descriptor, and a hierarchical interest point selection algorithm is designed to further conserve computational resources. The detected endpoint body parts are then classified according to learned models based on the LSC feature. The algorithm is tested using a public dataset and achieves good accuracy with a 100Hz processing speed on a standard PC. Finally, the LSC descriptor is improved to be more generalized. Both the endpoint body parts and the limbs are detected simultaneously. The generalized algorithm is integrated into the tracking framework, which provides a very strong cue and enables tracking failure recovery. The skeleton model is also simplified to further increase the system efficiency. To evaluate the system on arbitrary motions quantitatively, a new dataset is designed and collected using a synchronized Kinect sensor and a marker based motion capture system, including 22 different motions from 5 human subjects. The system is capable of tracking full body motions accurately using a simple skeleton-only model in near real-time on a laptop PC before optimization. Particle Filter Human Pose Computer Vision Single View Electrical and Computer Engineering
6	Security with visual understanding : Kinect human recognition capabilities applied in a home security system / Kinect human recognition capabilities applied in a home security system Fluckiger, S Joseph 08 August 2012 (has links) Vision is the most celebrated human sense. Eighty percent of the information humans receive is obtained through vision. Machines capable of capturing images are now ubiquitous, but until recently, they have been unable to recognize objects in the images they capture. In effect, machines have been blind. This paper explores the revolutionary new capability of a camera to recognize whether a human is present in an image and take detailed measurements of the person’s dimensions. It explains how the hardware and software of the camera work to provide this remarkable capability in just 200 milliseconds per image. To demonstrate these capabilities, a home security application has been built called Security with Visual Understanding (SVU). SVU is a hardware/software solution that detects a human and then performs biometric authentication by comparing the dimensions of the seen person against a database of known people. If the person is unrecognized, an alarm is sounded, and a picture of the intruder is sent via SMS text message to the home owner. Analysis is performed to measure the tolerance of the SVU algorithm for differentiating between two people based on their body dimensions. / text Machine vision Human pose recognition Object recognition Kinect Security camera SVU
7	Single View Human Pose Tracking Li, Zhenning January 2013 (has links) Recovery of human pose from videos has become a highly active research area in the last decade because of many attractive potential applications, such as surveillance, non-intrusive motion analysis and natural human machine interaction. Video based full body pose estimation is a very challenging task, because of the high degree of articulation of the human body, the large variety of possible human motions, and the diversity of human appearances. Methods for tackling this problem can be roughly categorized as either discriminative or generative. Discriminative methods can work on single images, and are able to recover the human poses efficiently. However, the accuracy and generality largely depend on the training data. Generative approaches usually formulate the problem as a tracking problem and adopt an explicit human model. Although arbitrary motions can be tracked, such systems usually have difficulties in adapting to different subjects and in dealing with tracking failures. In this thesis, an accurate, efficient and robust human pose tracking system from a single view camera is developed, mainly following a generative approach. A novel discriminative feature is also proposed and integrated into the tracking framework to improve the tracking performance. The human pose tracking system is proposed within a particle filtering framework. A reconfigurable skeleton model is constructed based on the Acclaim Skeleton File convention. A basic particle filter is first implemented for upper body tracking, which fuses time efficient cues from monocular sequences and achieves real-time tracking for constrained motions. Next, a 3D surface model is added to the skeleton model, and a full body tracking system is developed for more general and complex motions, assuming a stereo camera input. Partitioned sampling is adopted to deal with the high dimensionality problem, and the system is capable of running in near real-time. Multiple visual cues are investigated and compared, including a newly developed explicit depth cue. Based on the comparative analysis of cues, which reveals the importance of depth and good bottom-up features, a novel algorithm for detecting and identifying endpoint body parts from depth images is proposed. Inspired by the shape context concept, this thesis proposes a novel Local Shape Context (LSC) descriptor specifically for describing the shape features of body parts in depth images. This descriptor describes the local shape of different body parts with respect to a given reference point on a human silhouette, and is shown to be effective at detecting and classifying endpoint body parts. A new type of interest point is defined based on the LSC descriptor, and a hierarchical interest point selection algorithm is designed to further conserve computational resources. The detected endpoint body parts are then classified according to learned models based on the LSC feature. The algorithm is tested using a public dataset and achieves good accuracy with a 100Hz processing speed on a standard PC. Finally, the LSC descriptor is improved to be more generalized. Both the endpoint body parts and the limbs are detected simultaneously. The generalized algorithm is integrated into the tracking framework, which provides a very strong cue and enables tracking failure recovery. The skeleton model is also simplified to further increase the system efficiency. To evaluate the system on arbitrary motions quantitatively, a new dataset is designed and collected using a synchronized Kinect sensor and a marker based motion capture system, including 22 different motions from 5 human subjects. The system is capable of tracking full body motions accurately using a simple skeleton-only model in near real-time on a laptop PC before optimization. Particle Filter Human Pose Computer Vision Single View Electrical and Computer Engineering
8	Discriminative pose estimation using mixtures of Gaussian processes Fergie, Martin Paul January 2013 (has links) This thesis proposes novel algorithms for using Gaussian processes for Discriminative pose estimation. We overcome the traditional limitations of Gaussian processes, their cubic training complexity and their uni-modal predictive distribution by assembling them in a mixture of experts formulation. Our First contribution shows that by creating a large number of Fixed size Gaussian process experts, we can build a model that is able to scale to large data sets and accurately learn the multi-modal and non- linear mapping between image features and the subject’s pose. We demonstrate that this model gives state of the art performance compared to other discriminative pose estimation techniques.We then extend the model to automatically learn the size and location of each expert. Gaussian processes are able to accurately model non-linear functional regression problems where the output is given as a function of the input. However, when an individual Gaussian process is trained on data which contains multi-modalities, or varying levels of ambiguity, the Gaussian process is unable to accurately model the data. We propose a novel algorithm for learning the size and location of each expert in our mixture of Gaussian processes model to ensure that the training data of each expert matches the assumptions of a Gaussian process. We show that this model is able to out perform our previous mixture of Gaussian processes model.Our final contribution is a dynamics framework for inferring a smooth sequence of pose estimates from a sequence of independent predictive distributions. Discriminative pose estimation infers the pose of each frame independently, leading to jittery tracking results. Our novel algorithm uses a model of human dynamics to infer a smooth path through a sequence of Gaussian mixture models as given by our mixture of Gaussian processes model. We show that our algorithm is able to smooth and correct some mis- takes made by the appearance model alone, and outperform a baseline linear dynamical system. 518.1
9	Human pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNetHuman pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNet Calzavara, Ivan January 2020 (has links) In recent years, deep learning, a critical technology in computer vision, has achieved remarkable milestones in many fields, such as image classification and object detection. In particular, it has also been introduced to address the problem of violence detection, which is a big challenge considering the complexity to establish an exact definition for the phenomenon of violence. Thanks to the ever increasing development of new technologies for surveillance, we have nowadays access to an enormous database of videos that can be analyzed to find any abnormal behavior. However, by dealing with such huge amount of data it is unrealistic to manually examine all of them. Deep learning techniques, instead, can automatically study, learn and perform classification operations. In the context of violence detection, with the extraction of visual harmful patterns, it is possible to design various descriptors to represent features that can identify them. In this research we tackle the task of generating new augmented datasets in order to try to simplify the identification step performed by a violence detection technique in the field of Deep Learning. The novelty of this work is to introduce the usage of DensePose model to enrich the images in a dataset by highlighting (i.e. by identifying and segmenting) all the human beings present in them. With this approach we gained knowledge of how this algorithm performs on videos with a violent context and how the violent detection network benefit from this procedure. Performances have been evaluated from the point of view of segmentation accuracy and efficiency of the violence detection network, as well from the computational point of view. Results shows how the context of the scene is the major indicator that brings the DensePose model to correct segment human beings and how the context of violence does not seem to be the most suitable field for the application of this model since the common overlap of bodies (distinctive aspect of violence) acts as disadvantage for the segmentation. For this reason, the violence detection network does not exploit its full potential. Finally, we understood how such augmented datasets can boost up the training speed by reducing the time needed for the weights-update phase, making this procedure a helpful adds-on for implementations in different contexts where the identification of human beings still plays the major role. Violent Detection Deep Learning DensePose Human Pose Data Augmentation Image Segmentation Computer Engineering Datorteknik
10	Machine Learning Aided Millimeter Wave System for Real Time Gait Analysis Alanazi, Mubarak Alayyat 10 August 2022 (has links) No description available. Electrical Engineering

Search results