Global ETD Search

Return to search

Boosting for Learning From Imbalanced, Multiclass Data Sets

In many real-world applications, it is common to have uneven number of examples among multiple classes. The data imbalance, however, usually complicates the learning process, especially for the minority classes, and results in deteriorated performance. Boosting methods were proposed to handle the imbalance problem. These methods need elongated training time and require diversity among the classifiers of the ensemble to achieve improved performance. Additionally, extending the boosting method to handle multi-class data sets is not straightforward. Examples of applications that suffer from imbalanced multi-class data can be found in face recognition, where tens of classes exist, and in capsule endoscopy, which suffers massive imbalance between the classes. This dissertation introduces RegBoost, a new boosting framework to address the imbalanced, multi-class problems. This method applies a weighted stratified sampling technique and incorporates a regularization term that accommodates multi-class data sets and automatically determines the error bound of each base classifier. The regularization parameter penalizes the classifier when it misclassifies instances that were correctly classified in the previous iteration. The parameter additionally reduces the bias towards majority classes. Experiments are conducted using 12 diverse data sets with moderate to high imbalance ratios. The results demonstrate superior performance of the proposed method compared to several state-of-the-art algorithms for imbalanced, multi-class classification problems. More importantly, the sensitivity improvement of the minority classes using RegBoost is accompanied with the improvement of the overall accuracy for all classes. With unpredictability regularization, a diverse group of classifiers are created and the maximum accuracy improvement reaches above 24%. Using stratified undersampling, RegBoost exhibits the best efficiency. The reduction in computational cost is significant reaching above 50%. As the volume of training data increase, the gain of efficiency with the proposed method becomes more significant.

Boosting

multi-class classifications

stratified sampling

regularization parameter

imbalaced data sets

Identifer	oai:union.ndltd.org:unt.edu/info:ark/67531/metadc407775
Date	12 1900
Creators	Abouelenien, Mohamed
Contributors	Yuan, Xiaohui, Swigger, Kathleen, Fu, Song, Liu, Jianguo
Publisher	University of North Texas
Source Sets	University of North Texas
Language	English
Detected Language	English
Type	Thesis or Dissertation
Format	Text
Rights	Public, Abouelenien, Mohamed, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.0018 seconds

Boosting for Learning From Imbalanced, Multiclass Data Sets

Description

Links & Downloads

Tags

Additional Fields