Spelling suggestions: "subject:"feature level fusion"" "subject:"eature level fusion""
1 |
Fundus-DeepNet: Multi-Label Deep Learning Classification System for Enhanced Detection of Multiple Ocular Diseases through Data Fusion of Fundus ImagesAl-Fahdawi, S., Al-Waisy, A.S., Zeebaree, D.Q., Qahwaji, Rami S.R., Natiq, H., Mohammed, M.A., Nedoma, J., Martinek, R., Deveci, M. 29 September 2023 (has links)
Yes / Detecting multiple ocular diseases in fundus images is crucial in ophthalmic diagnosis. This study introduces the Fundus-DeepNet system, an automated multi-label deep learning classification system designed to identify multiple ocular diseases by integrating feature representations from pairs of fundus images (e.g., left and right eyes). The study initiates with a comprehensive image pre-processing procedure, including circular border cropping, image resizing, contrast enhancement, noise removal, and data augmentation. Subsequently, discriminative deep feature representations are extracted using multiple deep learning blocks, namely the High-Resolution Network (HRNet) and Attention Block, which serve as feature descriptors. The SENet Block is then applied to further enhance the quality and robustness of feature representations from a pair of fundus images, ultimately consolidating them into a single feature representation. Finally, a sophisticated classification model, known as a Discriminative Restricted Boltzmann Machine (DRBM), is employed. By incorporating a Softmax layer, this DRBM is adept at generating a probability distribution that specifically identifies eight different ocular diseases. Extensive experiments were conducted on the challenging Ophthalmic Image Analysis-Ocular Disease Intelligent Recognition (OIA-ODIR) dataset, comprising diverse fundus images depicting eight different ocular diseases. The Fundus-DeepNet system demonstrated F1-scores, Kappa scores, AUC, and final scores of 88.56%, 88.92%, 99.76%, and 92.41% in the off-site test set, and 89.13%, 88.98%, 99.86%, and 92.66% in the on-site test set.In summary, the Fundus-DeepNet system exhibits outstanding proficiency in accurately detecting multiple ocular diseases, offering a promising solution for early diagnosis and treatment in ophthalmology. / European Union under the REFRESH – Research Excellence for Region Sustainability and High-tech Industries project number CZ.10.03.01/00/22_003/0000048 via the Operational Program Just Transition. The Ministry of Education, Youth, and Sports of the Czech Republic - Technical University of Ostrava, Czechia under Grants SP2023/039 and SP2023/042.
|
2 |
Recognizing emotions in spoken dialogue with acoustic and lexical cuesTian, Leimin January 2018 (has links)
Automatic emotion recognition has long been a focus of Affective Computing. It has become increasingly apparent that awareness of human emotions in Human-Computer Interaction (HCI) is crucial for advancing related technologies, such as dialogue systems. However, performance of current automatic emotion recognition is disappointing compared to human performance. Current research on emotion recognition in spoken dialogue focuses on identifying better feature representations and recognition models from a data-driven point of view. The goal of this thesis is to explore how incorporating prior knowledge of human emotion recognition in the automatic model can improve state-of-the-art performance of automatic emotion recognition in spoken dialogue. Specifically, we study this by proposing knowledge-inspired features representing occurrences of disfluency and non-verbal vocalisation in speech, and by building a multimodal recognition model that combines acoustic and lexical features in a knowledge-inspired hierarchical structure. In our study, emotions are represented with the Arousal, Expectancy, Power, and Valence emotion dimensions. We build unimodal and multimodal emotion recognition models to study the proposed features and modelling approach, and perform emotion recognition on both spontaneous and acted dialogue. Psycholinguistic studies have suggested that DISfluency and Non-verbal Vocalisation (DIS-NV) in dialogue is related to emotions. However, these affective cues in spoken dialogue are overlooked by current automatic emotion recognition research. Thus, we propose features for recognizing emotions in spoken dialogue which describe five types of DIS-NV in utterances, namely filled pause, filler, stutter, laughter, and audible breath. Our experiments show that this small set of features is predictive of emotions. Our DIS-NV features achieve better performance than benchmark acoustic and lexical features for recognizing all emotion dimensions in spontaneous dialogue. Consistent with Psycholinguistic studies, the DIS-NV features are especially predictive of the Expectancy dimension of emotion, which relates to speaker uncertainty. Our study illustrates the relationship between DIS-NVs and emotions in dialogue, which contributes to Psycholinguistic understanding of them as well. Note that our DIS-NV features are based on manual annotations, yet our long-term goal is to apply our emotion recognition model to HCI systems. Thus, we conduct preliminary experiments on automatic detection of DIS-NVs, and on using automatically detected DIS-NV features for emotion recognition. Our results show that DIS-NVs can be automatically detected from speech with stable accuracy, and auto-detected DIS-NV features remain predictive of emotions in spontaneous dialogue. This suggests that our emotion recognition model can be applied to a fully automatic system in the future, and holds the potential to improve the quality of emotional interaction in current HCI systems. To study the robustness of the DIS-NV features, we conduct cross-corpora experiments on both spontaneous and acted dialogue. We identify how dialogue type influences the performance of DIS-NV features and emotion recognition models. DIS-NVs contain additional information beyond acoustic characteristics or lexical contents. Thus, we study the gain of modality fusion for emotion recognition with the DIS-NV features. Previous work combines different feature sets by fusing modalities at the same level using two types of fusion strategies: Feature-Level (FL) fusion, which concatenates feature sets before recognition; and Decision-Level (DL) fusion, which makes the final decision based on outputs of all unimodal models. However, features from different modalities may describe data at different time scales or levels of abstraction. Moreover, Cognitive Science research indicates that when perceiving emotions, humans make use of information from different modalities at different cognitive levels and time steps. Therefore, we propose a HierarchicaL (HL) fusion strategy for multimodal emotion recognition, which incorporates features that describe data at a longer time interval or which are more abstract at higher levels of its knowledge-inspired hierarchy. Compared to FL and DL fusion, HL fusion incorporates both inter- and intra-modality differences. Our experiments show that HL fusion consistently outperforms FL and DL fusion on multimodal emotion recognition in both spontaneous and acted dialogue. The HL model combining our DIS-NV features with benchmark acoustic and lexical features improves current performance of multimodal emotion recognition in spoken dialogue. To study how other emotion-related tasks of spoken dialogue can benefit from the proposed approaches, we apply the DIS-NV features and the HL fusion strategy to recognize movie-induced emotions. Our experiments show that although designed for recognizing emotions in spoken dialogue, DIS-NV features and HL fusion remain effective for recognizing movie-induced emotions. This suggests that other emotion-related tasks can also benefit from the proposed features and model structure.
|
3 |
A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared ImagesSiddiqui, Mohammad Faridul Haque 29 August 2019 (has links)
No description available.
|
4 |
Hierarchical Fusion Approaches for Enhancing Multimodal Emotion Recognition in Dialogue-Based Systems : A Systematic Study of Multimodal Emotion Recognition Fusion Strategy / Hierarkiska fusionsmetoder för att förbättra multimodal känslomässig igenkänning i dialogbaserade system : En systematisk studie av fusionsstrategier för multimodal känslomässig igenkänningLiu, Yuqi January 2023 (has links)
Multimodal Emotion Recognition (MER) has gained increasing attention due to its exceptional performance. In this thesis, we evaluate feature-level fusion, decision-level fusion, and two proposed hierarchical fusion methods for MER systems using a dialogue-based dataset. The first hierarchical approach integrates abstract features across different temporal levels by employing RNN-based and transformer-based context modeling techniques to capture nearby and global context respectively. The second hierarchical strategy incorporates shared information between modalities by facilitating modality interactions through attention mechanisms. Results reveal that RNN-based hierarchical fusion surpasses the baseline by 2%, while transformer-based context modeling and modality interaction methods improve accuracy by 0.5% and 0.6%, respectively. These findings underscore the significance of capturing meaningful emotional cues in nearby context and emotional invariants in dialogue MER systems. We also emphasize the crucial role of text modality. Overall, our research highlights the potential of hierarchical fusion approaches for enhancing MER system performance, presenting systematic strategies supported by empirical evidence. / Multimodal Emotion Recognition (MER) har fått ökad uppmärksamhet på grund av dess exceptionella prestanda. I denna avhandling utvärderar vi feature-level fusion, decision-level fusion och två föreslagna hierarkiska fusion-metoder för MER-system med hjälp av en dialogbaserad dataset. Den första hierarkiska metoden integrerar abstrakta funktioner över olika tidsnivåer genom att använda RNN-baserade och transformer-baserade tekniker för kontextmodellering för att fånga närliggande och globala kontexter, respektive. Den andra hierarkiska strategin innefattar delad information mellan modaliteter genom att underlätta modalitetsinteraktioner genom uppmärksamhetsmekanismer. Resultaten visar att RNN-baserad hierarkisk fusion överträffar baslinjen med 2%, medan transformer-baserad kontextmodellering och modellering av modalitetsinteraktion ökar noggrannheten med 0.5% respektive 0.6%. Dessa resultat understryker betydelsen av att fånga meningsfulla känslomässiga ledtrådar i närliggande sammanhang och emotionella invarianter i dialog MER-system. Vi betonar också den avgörande rollen som textmodalitet spelar. Övergripande betonar vår forskning potentialen för hierarkiska fusion-metoder för att förbättra prestandan i MER-system, genom att presentera systematiska strategier som stöds av empirisk evidens.
|
Page generated in 0.0925 seconds