Spelling suggestions: "subject:"post estimation""
1 |
A Human Kinetic Dataset and a Hybrid Model for 3D Human Pose EstimationWang, Jianquan 12 November 2020 (has links)
Human pose estimation represents the skeleton of a person in color or depth images to improve a machine’s understanding of human movement. 3D human pose estimation uses a three-dimensional skeleton to represent the human body posture, which is more stereoscopic than a two-dimensional skeleton. Therefore, 3D human pose estimation can enable machines to play a role in physical education and health recovery, reducing labor costs and the risk of disease transmission. However, the existing datasets for 3D pose estimation do not involve fast motions that would cause optical blur for a monocular camera but would allow the subjects’ limbs to move in a more extensive range of angles. The existing models cannot guarantee both real-time performance and high accuracy, which are essential in physical education and health recovery applications. To improve real-time performance, researchers have tried to minimize the size of the model and have studied more efficient deployment methods. To improve accuracy, researchers have tried to use heat maps or point clouds to represent features, but this increases the difficulty of model deployment.
To address the lack of datasets that include fast movements and easy-to-deploy models, we present a human kinetic dataset called the Kivi dataset and a hybrid model that combines the benefits of a heat map-based model and an end-to-end model for 3D human pose estimation. We describe the process of data collection and cleaning in this thesis. Our proposed Kivi dataset contains large-scale movements of humans. In the dataset, 18 joint points represent the human skeleton. We collected data from 12 people, and each person performed 38 sets of actions. Therefore, each frame of data has a corresponding person and action label. We design a preliminary model and propose an improved model to infer 3D human poses in real time. When validating our method on the Invariant Top-View (ITOP) dataset, we found that compared with the initial model, our improved model improves the mAP@10cm by 29%. When testing on the Kivi dataset, our improved model improves the mAP@10cm by 15.74% compared to the preliminary model. Our improved model can reach 65.89 frames per second (FPS) on the TensorRT platform.
|
2 |
3D reconstruction of a catheter path from a single view X-ray sequenceWeng, Ji Yao January 2003 (has links)
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
|
3 |
A Deep 3D Object Pose Estimation Framework for Robots with RGB-D SensorsWagh, Ameya Yatindra 24 April 2019 (has links)
The task of object detection and pose estimation has widely been done using template matching techniques. However, these algorithms are sensitive to outliers and occlusions, and have high latency due to their iterative nature. Recent research in computer vision and deep learning has shown great improvements in the robustness of these algorithms. However, one of the major drawbacks of these algorithms is that they are specific to the objects. Moreover, the estimation of pose depends significantly on their RGB image features. As these algorithms are trained on meticulously labeled large datasets for object's ground truth pose, it is difficult to re-train these for real-world applications. To overcome this problem, we propose a two-stage pipeline of convolutional neural networks which uses RGB images to localize objects in 2D space and depth images to estimate a 6DoF pose. Thus the pose estimation network learns only the geometric features of the object and is not biased by its color features. We evaluate the performance of this framework on LINEMOD dataset, which is widely used to benchmark object pose estimation frameworks. We found the results to be comparable with the state of the art algorithms using RGB-D images. Secondly, to show the transferability of the proposed pipeline, we implement this on ATLAS robot for a pick and place experiment. As the distribution of images in LINEMOD dataset and the images captured by the MultiSense sensor on ATLAS are different, we generate a synthetic dataset out of very few real-world images captured from the MultiSense sensor. We use this dataset to train just the object detection networks used in the ATLAS Robot experiment.
|
4 |
Simultaneous Pose and Correspondence Problem for Visual ServoingChiu, Raymond January 2010 (has links)
Pose estimation is a common problem in computer vision. The pose is the combination of the position and orientation of a particular object relative to some reference coordinate system. The pose estimation problem involves determining the pose of an object from one or multiple images of the object. This problem often arises in the area of robotics. It is necessary to determine the pose of an object before it can be manipulated by the robot. In particular, this research focuses on pose estimation for initialization of position-based visual servoing.
A closely related problem is the correspondence problem. This is the problem of finding a set of features from the image of an object that can be identified as the same feature from a model of the object. Solving for pose without known corre- spondence is also refered to as the simultaneous pose and correspondence problem, and it is a lot more difficult than solving for pose with known correspondence.
This thesis explores a number of methods to solve the simultaneous pose and correspondence problem, with focuses on a method called SoftPOSIT. It uses the idea that the pose is easily determined if correspondence is known. It first produces an initial guess of the pose and uses it to determine a correspondence. With the correspondence, it determines a new pose. This new pose is assumed to be a better estimate, thus a better correspondence can be determined. The process is repeated until the algorithm converges to a correspondence pose estimate. If this pose estimate is not good enough, the algorithm is restarted with a new initial guess.
An improvement is made to this algorithm. An early termination condition is added to detect conditions where the algorithm is unlikely to converge towards a good pose. This leads to an reduction in the runtime by as much as 50% and improvement in the success rate of the algorithm by approximately 5%.
The proposed solution is tested and compared with the RANSAC method and simulated annealing in a simulation environment. It is shown that the proposed solution has the potential for use in commercial environments for pose estimation.
|
5 |
Simultaneous Pose and Correspondence Problem for Visual ServoingChiu, Raymond January 2010 (has links)
Pose estimation is a common problem in computer vision. The pose is the combination of the position and orientation of a particular object relative to some reference coordinate system. The pose estimation problem involves determining the pose of an object from one or multiple images of the object. This problem often arises in the area of robotics. It is necessary to determine the pose of an object before it can be manipulated by the robot. In particular, this research focuses on pose estimation for initialization of position-based visual servoing.
A closely related problem is the correspondence problem. This is the problem of finding a set of features from the image of an object that can be identified as the same feature from a model of the object. Solving for pose without known corre- spondence is also refered to as the simultaneous pose and correspondence problem, and it is a lot more difficult than solving for pose with known correspondence.
This thesis explores a number of methods to solve the simultaneous pose and correspondence problem, with focuses on a method called SoftPOSIT. It uses the idea that the pose is easily determined if correspondence is known. It first produces an initial guess of the pose and uses it to determine a correspondence. With the correspondence, it determines a new pose. This new pose is assumed to be a better estimate, thus a better correspondence can be determined. The process is repeated until the algorithm converges to a correspondence pose estimate. If this pose estimate is not good enough, the algorithm is restarted with a new initial guess.
An improvement is made to this algorithm. An early termination condition is added to detect conditions where the algorithm is unlikely to converge towards a good pose. This leads to an reduction in the runtime by as much as 50% and improvement in the success rate of the algorithm by approximately 5%.
The proposed solution is tested and compared with the RANSAC method and simulated annealing in a simulation environment. It is shown that the proposed solution has the potential for use in commercial environments for pose estimation.
|
6 |
An Improved Path Integration Mechanism Using Neural Fields Which Implement A Biologically Plausible Analogue To A Kalman FilterConnors, Warren Anthoney 22 February 2013 (has links)
Interaction with the world is necessary for both animals and robots to complete
tasks. This interaction requires a sense of self, or the orientation of the robot or
animal with respect to the world. Creating and maintaining this model is a task
which is easily maintained by animals, however can be difficult for robots due to
the uncertainties in the world, sensing, and movement of the robot. This estimation
difficulty is increased in sensory deprived environments, where no external, inputs
are available to correct the estimate. Therefore, self generated cues of movement
are needed, such as vestibular input in an animal, or accelerometer input in a robot.
In spite of the difficulties, animals can easily maintain this model. This leads to the
question of whether we can learn from nature by examining the biological mechanisms
for pose estimation in animals. Previous work has shown that neural fields coupled
with a mechanism for updating the estimate can be used to maintain a pose estimate
through a sustained area of activity called a packet. Analysis of this mechanism
however has shown conditions where the field can provide unexpected results or break
down due to high accelerations input into the field. This analysis illustrates the
challenges of controlling the activity packet size under strong inputs, and a limited
speed capability using the existing mechanism. As a result of this, a novel weight
combination method is proposed to provide a higher speed and increased robustness.
The results of this is an increase of over two times the existing speed capability, and
a resistance of the field to break down under strong rotational inputs.
This updated neural field model provides a method for maintaining a stable pose
estimate. To show this, a novel comparison between the proposed neural field model
and the Kalman filter is considered, resulting in comparable performance in pose
prediction. This work shows that an updated neural field model provides a biologically
plausible pose prediction model using Bayesian inference, providing a biological
analogue to a Kalman filter.
|
7 |
Relative Pose Estimation Using Non-overlapping Multicamera ClustersTribou, Michael John January 2014 (has links)
This thesis considers the Simultaneous Localization and Mapping (SLAM) problem using a set of perspective cameras arranged such that there is no overlap in their fields-of-view. With the known and fixed extrinsic calibration of each camera within the cluster, a novel real-time pose estimation system is presented that is able to accurately track the motion of a camera cluster relative to an unknown target object or environment and concurrently generate a model of the structure, using only image-space measurements. A new parameterization for point feature position using a spherical coordinate update is presented which isolates system parameters dependent on global scale, allowing the shape parameters of the system to converge despite the scale parameters remaining uncertain. Furthermore, a flexible initialization scheme is proposed which allows the optimization to converge accurately using only the measurements from the cameras at the first time step. An analysis is presented identifying the configurations of the cluster motions and target structure geometry for which the optimization solution becomes degenerate and the global scale is ambiguous. Results are presented that not only confirm the previously known critical motions for a two-camera cluster, but also provide a complete description of the degeneracies related to the point feature constellations. The proposed algorithms are implemented and verified in experiments with a camera cluster constructed using multiple perspective cameras mounted on a quadrotor vehicle and augmented with tracking markers to collect high-precision ground-truth motion measurements from an optical indoor positioning system. The accuracy and performance of the proposed pose estimation system are confirmed for various motion profiles in both indoor and challenging outdoor environments.
|
8 |
Recognition using tagged objectsSoh, Ling Min January 2000 (has links)
This thesis describes a method for the recognition of objects in an unconstrained environment with a widely ranging illumination, imaged from unknown view points and complicated background. The general problem is simplified by placing specially designed patterns on the object that allows us to solve the pose determination problem easily. There are several key components involved in the proposed recognition approach. They include pattern detection, pose estimation, model acquisition and matching, searching and indexing the model database. Other crucial issues pertaining to the individual components of the recognition system such as the choice of the pattern, the reliability and accuracy of the pattern detector, pose estimator and matching and the speed of the overall system are addressed. After establishing the methodological framework, experiments are carried out on a wide range of both synthetic and real data to illustrate the validity and usefulness of the proposed methods. The principal contribution of this research is a methodology for Tagged Object Recognition (TOR) in unconstrained conditions. A robust pattern (calibration chart) detector is developed for off-the-shelf use. To empirically assess the effectiveness of the pattern detector and the pose estimator under various scenarios, simulated data generated using a graphics rendering process is used. This simulated data provides ground truth which is difficult to obtain in projected images. Using the ground truth, the detection error, which is usually ignored, can be analysed. For model matching, the Chamfer matching algorithm is modified to get a more reliable matching score. The technique facilitates reliable Tagged Object Recognition (TOR). Finally, the results of extensive quantitative and qualitative tests are presented that show the plausibility of practical use of Tagged Object Recognition (TOR). The features characterising the enabling technology developed are the ability to a) recognise an object which is tagged with the calibration chart, b) establish camera position with respect to a landmark and c) test any camera calibration and 3D pose estimation routines, thus facilitating future research and applications in mobile robots navigations, 3D reconstruction and stereo vision.
|
9 |
Digital Twin Coaching for Edge Computing Using Deep Learning Based 2D Pose EstimationGámez Díaz, Rogelio 15 April 2021 (has links)
In these challenging times caused by the COVID-19, technology that leverages Artificial Intelligence potential can help people cope with the pandemic. For example, people looking to perform physical exercises while in quarantine. We also find another opportunity in the widespread adoption of mobile smart devices, making complex Artificial Intelligence (AI) models accessible to the average user.
Taking advantage of this situation, we propose a Smart Coaching experience on the Edge with our Digital Twin Coaching (DTC) architecture. Since the general population is advised to work from home, sedentarism has become prevalent. Coaching is a positive force in exercising, but keeping physical distance while exercising is a significant problem. Therefore, a Smart Coach can help in this scenario as it involves using smart devices instead of direct communication with another person. Some researchers have worked on Smart Coaching, but their systems often involve complex devices such as RGB-Depth cameras, making them cumbersome to use. Our approach is one of the firsts to focus on everyday smart devices, like smartphones, to solve this problem.
Digital Twin Coaching can be defined as a virtual system designed to help people improve in a specific field and is a powerful tool if combined with edge technology. The DTC architecture has six characteristics that we try to fulfill: adaptability, compatibility, flexibility, portability, security, and privacy.
We collected training data of 10 subjects using a 2D pose estimation model to train our models since there was no dataset of Coach-Trainee videos. To effectively use this information, the most critical pre-processing step was synchronization. This step synchronizes the coach and the trainee’s poses to overcome the trainee's action lag while performing the routine in real-time.
We trained a light neural network called “Pose Inference Neural Network” (PINN) to serve as a fine-tuning architecture mechanism. We improved the generalist 2D pose estimation model with this trained neural network while keeping the time complexity relatively unaffected. We also propose an Angular Pose Representation to compare the trainee and coach's stances that consider the differences in different people's body proportions.
For the PINN model, we use Random Search Optimization to come up with the best configuration. The configurations tested included using 1, 2, 3, 4, 5, and 10 layers. We chose the 2-Layer Neural Network (2-LNN) configuration because it was the fastest to train and predict while providing a fair tradeoff between performance and resource consumption. Using frame synchronization in pre-processing, we improved 76% on the test loss (Mean Squared Error) while training with the 2-LNN. The PINN improved the R2 score of the PoseNet model by at least 15% and at most 93% depending on the configuration. Our approach only added 4 seconds (roughly 2% of the total time) to the total processing time on average. Finally, the usability test results showed that our Proof of Concept application, DTCoach, was considered easy to learn and convenient to use. At the same time, some participants mentioned that they would like to have more features and improved clarity to be more invested in using the app frequently.
We hope DTCoach can help people stay more active, especially in quarantine, as the application can serve as a motivator. Since it can be run on modern smartphones, it can quickly be adopted by many people.
|
10 |
Performance Enhancements of the Spin-Image Pose Estimation AlgorithmGerlach, Adam R. 12 April 2010 (has links)
No description available.
|
Page generated in 0.1297 seconds