Return to search

Automated Mental Disorders Assessment Using Machine Learning

Mental and behavioural disorders such as bipolar disorder and depression are critical healthcare issues that affected approximately 45 and 264 million people around the world, respectively in 2020. Early detection and intervention are crucial for limiting the negative effects that these illnesses can have on people’s lives.
Although the symptoms for different mental disorders vary, they generally are characterized by a combination of abnormal behaviours, thoughts, and emotions. Mental disorders can affect one’s ability to relate to others and function every day. To assess symptoms, clinicians often use structured clinical interviews and standard questioners. However, there is a scarcity of automated or technology-assisted tools that can simplify the diagnostic process.
The main objective of this thesis is to investigate, develop, and propose automated methods for mental disorder detection. We focus in our research on bipolar disorder and depression as they are two of the most common and debilitating mental illnesses.
Bipolar disorder is one of the most prevalent mental illnesses in the world. Its principal indicator is the extreme swings in the mood ranging from the manic to depressive states. We propose automatic ternary classification models for the bipolar disorder manic states. We employ a dataset that uses the Young Mania Recall Scale to distinguish the manic states of patients as: Mania, Hypo- Mania, and Remission. The dataset comprises audio-visual recordings of bipolar disorder patients undergoing a structured interview.
We propose three bipolar disorder classification solutions. The first approach uses a hybrid LSTM-CNN model. We apply a CNN model to extract facial features from video signals. We supply the features’ sequence to an LSTM model to resolve the bipolar disorder state. Our solution achieved promising results on the development and test set of the Turkish Audio-Visual Bipolar Disorder Corpus with the Unweighted Average Recall of 60.67% and 57.4%, respectively.
The second solution employs additional features from the structured interview recordings. We acquire visual representations along with audio and textual cues. We capture Mel-Frequency Cepstral Coefficients and Geneva Minimalistic Acoustic Parameter Set as audio features. We compute linguistic and sentiment features for each subject’s transcript. We present a stacked ensemble classifier to classify all fused features after feature selection. A set of three homogeneous CNNs and an MLP constitute the first and second levels of the stacked ensemble classifier respectively. Moreover, we use reinforcement learning to optimize the networks and their hyperparameters. We show that our stacked ensemble solution outperforms existing models on the Turkish Audio-Visual Bipolar Disorder corpus with a 59.3% unweighted average unit on the test set. To the best of our knowledge, this is the highest performance achieved on this dataset.
The Turkish Audio-Visual Bipolar Disorder dataset comprises a relatively small number of videos. Moreover, the labels for the testing set are kept confidential by the dataset provider. Hence, this motivated us to train a classifier using a semi-supervised ladder network for the third solution. This network benefits from unlabeled data during training. Our goal was to investigate whether a bipolar disorder states classifier can be trained using a mix of labelled and unlabelled data. This would alleviate the burden of labelling all the videos in the training set. We collect informative audio, visual, and textual features from the recordings to realize a multi-model classifier of the manic states. The third proposed model achieved a 53.7% and 60.0% unweighted average unit on the test and development sets, respectively.
There is a growing demand for automated depression detection system to control the subjective bias in diagnosis. We propose an automated depression severity detection model that uses multi- modal fusion of audio and textual information. We train the model on the E-DAIC corpus, which labels the individual’s depression level with patient health questionnaire score. We use MFCCs and eGeMAPs as audio representations and Word2Vec embeddings for the textual modality. Then, we implement a stacked ensemble regressor to detect depression severity. The proposed model achieves a concordance correlation coefficient 0.49 on the test set. To the best of our knowledge, this is the highest performing model on this dataset.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/43014
Date13 December 2021
CreatorsAbaei Koupaei, Niloufar
ContributorsAl Osman, Hussein
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0017 seconds