<p> With the successful development of video recording devices and sharing platforms, visual media has become a significant component of everyone's life in the world. To better organize and understand the tremendous amount of visual data, computer vision and machine learning have become the key technologies to resolve such a huge problem. Among the topics in computer vision research, human activity analysis is one of the most challenging and promising areas. Human activity analysis is dedicated to detecting, recognizing, and understanding the context and meaning of human activities in visual media. This dissertation focuses on two aspects in human activity analysis: 1) how to utilize multi-modality approach, including depth sensors and traditional RGB cameras, for human action modeling. 2) How to utilize more advanced machine learning technologies, such as deep learning and sparse coding, to address more sophisticated problems such as attribute learning and automatic video captioning. </p><p> To explore the utilization of the depth cameras, we first present a depth camera-based image descriptor called histogram of 3D facets (H3DF) and its utilization in human action and hand gesture recognition and a holistic depth video representation for human actions. To unify both the inputs from depth cameras and RGB cameras, this dissertation first discusses a joint framework to model human affections from both facial expressions and body gestures with a multi-modality fusion framework. Then we present deep learning-based frameworks for human attribute learning and automatic video captioning tasks. Compared to human action detection recognition, automatic video captioning is more challenging because it includes complex language models and visual context. Extensive experiments have also been conducted on several public datasets to demonstrate that our proposed frameworks in this dissertation outperform the state-of-the-art approaches in this research area.</p>
Identifer | oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:10159927 |
Date | 23 November 2016 |
Creators | Zhang, Chenyang |
Publisher | The City College of New York |
Source Sets | ProQuest.com |
Language | English |
Detected Language | English |
Type | thesis |
Page generated in 0.0018 seconds