Global ETD Search

1	Imbalanced Data Classiﬁcation with the K-Closest Resemblance Classiﬁer for Remote Sensing and Social Media Texts Duan, Cheng 10 November 2020 (has links) Data imbalance has been a challenge in many areas of automatic classiﬁcation. Many popular approaches including over-sampling, under-sampling, and Synthetic Minority Oversampling Technique (SMOTE) have been developed and tested in previous research. A big problem with these techniques is that they try to solve the problem by modifying the original data rather than truly overcome the imbalance and let the classiﬁers learn. For tasks in areas like remote sensing and depression detection, the imbalanced data challenge also exists. Researchers have made eﬀorts to overcome the challenge by adopting methods at the data pre-processing step. However, in remote sensing and depression detection tasks, the main interest is still on applying diﬀerent new classiﬁers such as deep learning which has powerful classiﬁcation ability but still do not consider data imbalance as prime factor of lower classiﬁcation performance. In this thesis, we demonstrate the performance of K-CR in our evaluation experiments on a urban land cover classiﬁcation dataset and on two depression detection datasets. The latter two datasets consist in social media texts (tweets), therefore we propose to adopt a feature selection technique Term Frequency - Category-Based Term Weights (TF-CBTW) and various word embedding techniques (Word2Vec, FastText, GloVe, and language model BERT). This feature selection method was not applied before in similar settings and we show that it helps to improve the eﬃciency and the results of the K-CR classiﬁer. Our three experiments show that K-CR can achieve comparable performance on the majority classes and better performance on minority classes when compared to other classiﬁers such as Random Forest, K-Nearest Neighbour, Support Vector Machines, Multi-layer Perception, Convolutional Neural Networks, and Long Short-Term Memory. K-CR Prototype-based Classifier Data Imbalance Feature Selection Remote Sensing Depression Detection

Search results

Imbalanced Data Classiﬁcation with the K-Closest Resemblance Classiﬁer for Remote Sensing and Social Media Texts