Return to search

Use of multiple views for human pose estimation. / CUHK electronic theses & dissertations collection

人體姿態估計系統是用於從視頻圖像中判斷人體在空間中姿態的系統。該系統面臨的主要的問題有:人體姿態空間的維度高;人體四肢的深度信息不確定;人體可以穿多種衣服;人體經常會被自身遮擋。多攝像頭系統可以觀察到人體同一姿態的更多數據,因此可以有效的克服人體姿態估計的不確定性。在本研究中,我們採用多種方法研究多攝像頭人體姿態估計系統,並提出了一種融合多種約束的框架。 / 在多攝像頭系統中,可以用到的約束包括:(一)圖像觀測約束:估計的人體姿態投影到圖像中需要和所有視角的觀察一致,(二)人體姿態可行性約束:人體部位之間要滿足身體連接約束並且估計所得的人體姿態要符合真實人體的要求,(三)三維剛體約束:從不同視角觀察到的人體要保持空間一致性,(四)行為約束:人體的姿態應與先驗的行為信息保持一致。本研究的目標是開發出一個可以同時利用上述約束的多攝像頭系統,該系統將可以同時無縫的整合多個攝像頭,並且可以穩定有效的估計人體的三維姿態。本文研究了基於單目系統的三維人體姿態估計方法,並基於約束一和約束二提出了一個新的人體模型估計人體姿態;本文提出仿射立體投影模型,並將該模型用於整合多個視角的觀察數據,從而使姿態估計同時得到約束一,約束二和約束三的支持;本文展示了如何使用多視角行為流形庫同時應用以上提到的四種約束,並有效的估計三維人體姿態;最後我們提出了基於流形庫的部分輸入高斯過程處理人體姿態估計腫的遮擋問題。 / 本論文有以下貢獻:(1)首次提出了仿射立體投影模型並將其用於描述三維剛體約束。使用這種方法,可以方便的將三維剛體約束集成於由底向上的人體姿態估計框架。(2)將人體姿態可行性約束以及三維剛體約束同時集成於多視角流型庫。即使在多行為的環境中,該方法也可以直接把多視角觀察數據映射至人體姿態空間。(3)通過綜合分析多個視角的數據,該系統可以有效的克服自我遮擋問題。(4)該系統易於擴展,基於仿射立體投影模型的方法和基於多視角流形庫的方法都可以用在多於三個攝像頭的系統中。 / A human pose estimation system is to determine the full human body pose in space from merely video data. Key difficulties of this problem include: full body kinematics is of high dimensionality, limb depths are ambiguous, people can impose various clothes, and there are often self-occlusions. The use of multiple views could enhance robustness of the solution toward uncertainties, as more data are collected about the same pose. In this research, we study multi-view based human pose estimation by exploring a variety of approaches and propose a framework that integrates multiple constraints. / In a multiple view system, the constraints that could be applied for human pose estimation include: (1) Image evidence: the projection of the estimated 3D human body should satisfy the 2D observations in all views, (2) Feasible human pose: neighboring body parts should be connected according to the body articulation and all joints angles should stay feasible, (3) 3D object rigidity: the corresponding parts over all views should satisfy the multi-view consistency, and (4) Action context: the detected results should be in line with prior knowledge about the possible “activities“. The objective of this research is to develop a multiple view system that could embed all the above constraints in a natural way while integrate more cameras into the system seamlessly to enhance robustness. Specifically, we investigate the part based monocular 3D estimation algorithm and develop a novel human model to assist the pose inference based on the constraint (1) and (2); we propose an affine stereo model to associate multiple views’ data so that body pose inference is supported by constraint (1), (2) and (3) simultaneously; we present how to apply multi-view activity manifold library to associate multiple views and estimate human pose in 3D efficiently so that all the four constraints are integrated into one framework; and we finally propose a partial-input Gaussian process to handle the body occlusion problem within the manifold library framework. / The thesis has four contributions: (1), an affine stereo approach is developed to efficiently explore the object rigidity, and this constraint is integrated into a bottom-up framework smoothly. (2), a multi-view visual manifold library is proposed to capture the human body articulation and rigidity in the multi-activity context, simplifying the pose estimation into a direct mapping from multi-view image evidence to 3D pose. (3), the multi-view system efficiently solves the self-occlusion problem by analyzing multi-view’s data. (4), the multi-view system is designed to be scalable; both the affine stereo based approach and the multi-view visual manifold library based approach could be applied to systems with more than 3 cameras. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Wang, Zibin. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 144-150). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / ABSTRACT --- p.i / 摘要 --- p.iii / ACKNOWLEDGEMENTS --- p.v / TABLE OF CONTENTS --- p.vi / LIST OF FIGURES --- p.xi / LIST OF TABLES --- p.xviii / Chapter Chapter One : --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Goals --- p.3 / Chapter 1.3 --- Challenges --- p.3 / Chapter 1.3.1 --- High Dimensional State Space --- p.3 / Chapter 1.3.2 --- Observations --- p.4 / Chapter 1.3.3 --- Multiple Views Integration --- p.6 / Chapter 1.4 --- Summary of the Approach --- p.7 / Chapter 1.5 --- Thesis Overview --- p.9 / Chapter Chapter Two : --- Background --- p.10 / Chapter 2.1 --- Top-down Framework --- p.11 / Chapter 2.1.1 --- Background Subtraction --- p.11 / Chapter 2.1.2 --- Deterministic Approach --- p.13 / Chapter 2.1.3 --- Sampling based Approach --- p.14 / Chapter 2.1.4 --- Regression based Method --- p.16 / Chapter 2.2 --- Bottom-up Framework --- p.17 / Chapter 2.2.1 --- Efficient Pictorial Structure --- p.18 / Chapter 2.2.2 --- Discriminative Part Detector --- p.19 / Chapter 2.2.3 --- Sampling based Inference --- p.20 / Chapter 2.2.4 --- Temporal Information --- p.21 / Chapter 2.3 --- Human Pose Estimation using Range Sensor --- p.21 / Chapter 2.4 --- Conclusion --- p.22 / Chapter Chapter Three : --- Pose Estimation from Single View --- p.23 / Chapter 3.1 --- Related Works --- p.25 / Chapter 3.2 --- The 3D Human Model --- p.26 / Chapter 3.3 --- Acquiring the Appearance Facet --- p.29 / Chapter 3.3.1 --- 2D Appearance Extraction from Each Training Image --- p.30 / Chapter 3.3.2 --- Acquiring 3D Appearance --- p.31 / Chapter 3.4 --- Data Driven Belief Propagation for Pose Estimation --- p.32 / Chapter 3.4.1 --- A Bayesian Formulation --- p.32 / Chapter 3.4.2 --- Belief Propagation --- p.34 / Chapter 3.4.3 --- Importance Function Sampling --- p.37 / Chapter 3.5 --- Experimental Results --- p.40 / Chapter 3.6 --- Conclusion --- p.45 / Chapter Chapter Four : --- Integrating Multiple Views using Affine Stereo Model --- p.46 / Chapter 4.1 --- Related Works --- p.48 / Chapter 4.2 --- Human Model and Problem Formulation --- p.50 / Chapter 4.3 --- Associating Multiple Image Streams --- p.53 / Chapter 4.3.1 --- Linear Relation of Multiple Views --- p.54 / Chapter 4.3.2 --- Rank Constraint --- p.58 / Chapter 4.4 --- Human Pose Estimation System using Multi-view and Other Constraints --- p.62 / Chapter 4.4.1 --- Body Part Candidates from Discriminative Body Part Detector --- p.63 / Chapter 4.4.2 --- From Body Part Candidates to Body Candidates in each view --- p.65 / Chapter 4.4.3 --- Associating Body Candidates across Views --- p.67 / Chapter 4.5 --- Experimental Results --- p.74 / Chapter 4.5.1 --- Evaluation of the Multi-view Linear Relationship --- p.74 / Chapter 4.5.2 --- Performance over the HumanEva Dataset --- p.79 / Chapter 4.6 --- Conclusion --- p.86 / Chapter Chapter Five : --- Integrating Multiple Views using Activity Manifold Library --- p.88 / Chapter 5.1 --- Related Works --- p.90 / Chapter 5.2 --- Multi-view Manifold Library --- p.93 / Chapter 5.2.1 --- Body Representation in Space and Views --- p.94 / Chapter 5.2.2 --- Human-orientation-dependent Multi-view Visual Manifold --- p.95 / Chapter 5.3 --- Human Pose Estimation in 3D via Multi-view Manifold --- p.97 / Chapter 5.3.1 --- Find Multi-view Body Hypothesis in 2D --- p.97 / Chapter 5.3.2 --- Mutual Selection between Multi-view Body Hypothesises and Manifolds --- p.99 / Chapter 5.4 --- Experimental Results --- p.102 / Chapter 5.4.1 --- Synthetic Data Test --- p.103 / Chapter 5.4.2 --- Real Image Evaluation --- p.108 / Chapter 5.4.3 --- Qualitative Test for Generalization Capability --- p.110 / Chapter 5.4.4 --- Calculation Speed --- p.114 / Chapter 5.5 --- Conclusion --- p.115 / Chapter Chapter Six : --- Partial-Input Gaussian Process for Inferring Occluded Human Pose --- p.116 / Chapter 6.1 --- Related Works --- p.118 / Chapter 6.2 --- Human-orientation-invariant Multi-view Visual Manifold --- p.119 / Chapter 6.3 --- Human Pose estimation in 3D via Multi-view Manifold --- p.121 / Chapter 6.3.1 --- 2D Pre-processing --- p.121 / Chapter 6.3.2 --- Mutual Selection between Multi-view Body Hypothesises and Manifolds --- p.121 / Chapter 6.3.3 --- Occlusion Detection and Partial-input Gaussian Process --- p.122 / Chapter 6.4 --- Experimental Results --- p.126 / Chapter 6.4.1 --- Multi-view Manifolds and Evaluations for Different Views --- p.126 / Chapter 6.4.2 --- Evaluation for Occlusion Data --- p.131 / Chapter 6.4.3 --- Evaluation for Gavrila’s Dataset --- p.132 / Chapter 6.4.4 --- Qualitative Test for Generalization Capability --- p.134 / Chapter 6.5 --- Conclusion --- p.139 / Chapter Chapter Seven : --- Conclusions and Future Works --- p.140 / Chapter 7.1 --- Conclusion --- p.140 / Chapter 7.2 --- Limitation --- p.142 / Chapter 7.3 --- Future Directions --- p.142 / Bibliography --- p.144

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328169
Date January 2012
ContributorsWang, Zibin, Chinese University of Hong Kong Graduate School. Division of Mechanical and Automation Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatelectronic resource, electronic resource, remote, 1 online resource (xix, 150 leaves) : ill.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0129 seconds