Human pose estimation represents the skeleton of a person in color or depth images to improve a machine’s understanding of human movement. 3D human pose estimation uses a three-dimensional skeleton to represent the human body posture, which is more stereoscopic than a two-dimensional skeleton. Therefore, 3D human pose estimation can enable machines to play a role in physical education and health recovery, reducing labor costs and the risk of disease transmission. However, the existing datasets for 3D pose estimation do not involve fast motions that would cause optical blur for a monocular camera but would allow the subjects’ limbs to move in a more extensive range of angles. The existing models cannot guarantee both real-time performance and high accuracy, which are essential in physical education and health recovery applications. To improve real-time performance, researchers have tried to minimize the size of the model and have studied more efficient deployment methods. To improve accuracy, researchers have tried to use heat maps or point clouds to represent features, but this increases the difficulty of model deployment.
To address the lack of datasets that include fast movements and easy-to-deploy models, we present a human kinetic dataset called the Kivi dataset and a hybrid model that combines the benefits of a heat map-based model and an end-to-end model for 3D human pose estimation. We describe the process of data collection and cleaning in this thesis. Our proposed Kivi dataset contains large-scale movements of humans. In the dataset, 18 joint points represent the human skeleton. We collected data from 12 people, and each person performed 38 sets of actions. Therefore, each frame of data has a corresponding person and action label. We design a preliminary model and propose an improved model to infer 3D human poses in real time. When validating our method on the Invariant Top-View (ITOP) dataset, we found that compared with the initial model, our improved model improves the mAP@10cm by 29%. When testing on the Kivi dataset, our improved model improves the mAP@10cm by 15.74% compared to the preliminary model. Our improved model can reach 65.89 frames per second (FPS) on the TensorRT platform.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/41437 |
Date | 12 November 2020 |
Creators | Wang, Jianquan |
Contributors | El Saddik, Abdulmotaleb |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Page generated in 0.0022 seconds