Over the past decade, the field of affective computing has received increasing attention. With advancements in machine learning, a wide range of methodologies have been developed to better understand human emotions. However, one of the major challenges in this field is accurately modeling emotions on a set of continuous dimensions, such as arousal and valence. This type of modeling is essential to represent complex and subtle emotions, and to capture the full spectrum of human emotional experiences. Additionally, predicting changes in emotions across time series adds another layer of complexity, as emotions can shift continuously.
Our work addresses these challenges using a dataset that includes natural and spontaneous emotions from diverse individuals. We extract multiple features from different modalities, including audio, video, and text, and use them to predict emotions across three axes: arousal, valence, and liking. To achieve this, we employ deep features and multiple fusion techniques to combine the modalities. Our results demonstrate that temporal convolutional networks outperform long short-term memory models in multimodal emotion prediction.
Overall, our research contributes to advancing the field of affective computing by developing more accurate and comprehensive methods for modeling and predicting human emotions.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/45175 |
Date | 19 July 2023 |
Creators | Harb, Hussein |
Contributors | Al Osman, Hussein |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Page generated in 0.0017 seconds