Global ETD Search

Return to search

Exploring the Feasibility of Machine Learning Techniques in Recognizing Complex Human Activities

This dissertation introduces several technical innovations that improve the ability of machine learning models to recognize a wide range of complex human activities. As human sensor data becomes more abundant, the need to develop algorithms for understanding and interpreting complex human actions has become increasingly important. Our research focuses on three key areas: multi-agent activity recognition, multi-person pose estimation, and multimodal fusion.
To tackle the problem of monitoring coordinated team activities from spatio-temporal traces, we introduce a new framework that incorporates field of view data to predict team performance. Our framework uses Spatial Temporal Graph Convolutional Networks (ST-GCN) and recurrent neural network layers to capture and model the dynamic spatial relationships between agents. The second part of the dissertation addresses the problem of multi-person pose estimation (MPPE) from video data. Our proposed technique (Language Assisted Multi-person Pose estimation) leverages text representations from multimodal foundation models to learn a visual representation that is more robust to occlusion. By infusing semantic information into pose estimation, our approach enables precise estimations, even in cluttered scenes. The final part of the dissertation examines the problem of fusing multimodal physiological input from cardiovascular and gaze tracking sensors to exploit the complementary nature of these modalities. When dealing with multimodal features, uncovering the correlations between different modalities is as crucial as identifying effective unimodal features. This dissertation introduces a hybrid multimodal tensor fusion network that is effective at learning both unimodal and bimodal dynamics.
The outcomes of this dissertation contribute to advancing the field of complex human activity recognition by addressing the challenges associated with multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. The proposed innovations have potential applications in various domains, including video surveillance, human-robot interaction, sports analysis, and healthcare monitoring. By developing intelligent systems capable of accurately recognizing complex human activities, this research paves the way for improved safety, efficiency, and decision-making in a wide range of real-world applications.

Identifer	oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:etd2023-1055
Date	01 January 2023
Creators	Hu, Shengnan
Publisher	STARS
Source Sets	University of Central Florida
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	Graduate Thesis and Dissertation 2023-2024

Page generated in 0.0026 seconds

Exploring the Feasibility of Machine Learning Techniques in Recognizing Complex Human Activities

Description

Links & Downloads

Tags

Additional Fields