Return to search

Multimodal deep learning systems for analysis of human behavior, preference, and state

Deep learning has become a widely used tool for inference and prediction in neuroscience research. Despite their differences, most neural network architectures convert raw input data into lower-dimensional vector representations that subsequent network layers can more easily process. Significant advancements have been made in improving latent representations in audiovisual problems. However, human neurophysiological data is often scarcer, noisier, and more challenging to learn from when integrated from multiple sources. The present work integrates neural, physiological, and behavioral data to improve human behavior, preference, and state prediction. Across five studies, we explore (i) how embeddings, or vectorized representations, can be designed to understand the context of input data better, (ii) how the attention mechanism found in transformer models can be adapted to capture crossmodal relationships in an interpretable way, and (iii) how humans make sensorimotor decisions in a realistic scenario with implications for designing automated systems.

Part I focuses on improving the context for latent representations in deep neural networks. We achieve this by introducing a hierarchical structure in clinical data to predict cognitive performance in a large, longitudinal cohort study. In a separate study, we present a recurrent neural network that captures non-cognitive pupil dynamics by utilizing visual areas of interest as inputs. In Part II, we employ attention-based approaches for multimodal integration by learning to weigh modalities that differ in the type of information they capture. We show that our crossmodal attention framework can adapt to audiovisual and neurophysiological input data. Part III proposes a novel paradigm to study sensorimotor decision-making in a driving scenario and study brain connectivity in the context of pupil-linked arousal.

Our findings reveal that embeddings that capture input data's hierarchical or temporal context consistently yield high performance across different tasks. Moreover, our studies demonstrate the versatility of the attention mechanism, which we show can effectively integrate various modalities such as text descriptions, perceived differences in video clips, and recognized objects. Our multimodal transformer, designed to handle neurophysiological data, improves the prediction of emotional states by integrating brain and autonomic activity. Taken together, our work advances the development of multimodal systems for predicting human behavior, preference, and state across domains.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/davv-2e47
Date January 2023
CreatorsKoorathota, Sharath Chandra
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0019 seconds