Global ETD Search

Return to search

TOWARDS IMPROVED REPRESENTATIONS ON HUMAN ACTIVITY UNDERSTANDING

Human action recognition stands as a cornerstone in the domain of computer vision, with its utility spanning across emergency response, sign language interpretation, and the burgeoning fields of augmented and virtual reality. The transition from conventional video-based recognition to skeleton-based methodologies has been a transformative shift, offering a robust alternative less susceptible to environmental noise and more focused on the dynamics of human movement.This body of work encapsulates the evolution of action recognition, emphasizing the pivotal role of Graph Convolution Network (GCN) based approaches, particularly through the innovative InfoGCN framework. InfoGCN has set a new precedent in the field by introducing an information bottleneck-based learning objective, a self-attention graph convolution module, and a multi-modal representation of the human skeleton. These advancements have collectively elevated the accuracy and efficiency of action recognition systems.Addressing the prevalent challenge of occlusions, particularly in single-camera setups, the Pose Relation Transformer (PORT) framework has been introduced. Inspired by the principles of Masked Language Modeling in natural language processing, PORT refines the detection of occluded joints, thereby enhancing the reliability of pose estimation under visually obstructive conditions.Building upon the foundations laid by InfoGCN, the Skeleton ODE framework has been developed for online action recognition, enabling real-time inference without the need for complete action observation. By integrating Neural Ordinary Differential Equations, Skeleton ODE facilitates the prediction of future movements, thus reducing latency and paving the way for real-time applications.The implications of this research are vast, indicating a future where real-time, efficient, and accurate human action recognition systems could significantly impact various sectors, including healthcare, autonomous vehicles, and interactive technologies. Future research directions point towards the integration of multi-modal data, the application of transfer learning for enhanced generalization, the optimization of models for edge computing, and the ethical deployment of action recognition technologies. The potential for these systems to contribute to healthcare, particularly in patient monitoring and disease detection, underscores the need for continued interdisciplinary collaboration and innovation.

10.25394/pgs.24718260.v1

Computer vision

Human Action Recognition

Representation Learning

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/24718260
Date	04 December 2023
Creators	Hyung-gun Chi (17543172)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/TOWARDS_IMPROVED_REPRESENTATIONS_ON_HUMAN_ACTIVITY_UNDERSTANDING/24718260

Page generated in 0.0024 seconds

TOWARDS IMPROVED REPRESENTATIONS ON HUMAN ACTIVITY UNDERSTANDING

Description

Links & Downloads

Tags

Additional Fields