Return to search

Modification of the RusBoost algorithm : A comparison of classifiers on imbalanced data / Modifikation av RusBoost algoritmen : En jämförelse av klassificeringsmetoder på obalanserad data

In many situations data is imbalanced, meaning the proportion of one class is larger than the other(s). Standard classifiers often produce undesirable results when the data is imbalanced and different methods have been developed in the attempt to improve classification under such conditions. Examples of this are the algorithms AdaBoost, RusBoost, and SmoteBoost which modifies the cost for misclassified observations, and the latter two also reduce the class imbalances when training the classifier. This thesis presents a new method, Modified RusBoost, where the RusBoost algorithm is modified in a way such that observations that are harder to classify correctly are assigned a lower probability of being removed in the under-sampling process. Comparisons were made between the performance of this method, AdaBoost, RusBoost, and SmoteBoost on imbalanced data. Also, how imbalances affect the different classifiers were investigated. The performance of these methods were compared on 20 real data sets. Overall, Modified RusBoost performed better or comparable to the other methods. Indicating that this algorithm can be a good alternative when classifying imbalanced data. Also, results showed that an increase of ρ, a ratio of majority over minority observations in a data set, has a negative impact on performance of the algorithms. However, this negative impact of ρ affects the performance of all methods similarly.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-196899
Date January 2022
CreatorsForslund, Isak
PublisherUmeå universitet, Statistik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds