Global ETD Search

Return to search

Handling Imbalanced Data Classification With Variational Autoencoding And Random Under-Sampling Boosting

In this thesis, a comparison of three different pre-processing methods for imbalanced classification data, is conducted. Variational Autoencoder, Random Under-Sampling Boosting and a hybrid approach of the two, are applied to three imbalanced classification data sets with different class imbalances. A logistic regression (LR) model is fitted to each pre-processed data set and based on its classification performance, the pre-processing methods are evaluated. All three methods shows indications of different advantages when handling class imbalances. For each pre-processed data, the LR-model has is better at correctly classifying minority class observations, compared to a LR-model fitted to the original class imbalanced data sets. Evaluating the overall classification performance, both VAE and RUSBoost shows improving classification results while the hybrid method performs worse for the moderate class imbalanced data and best for the highly imbalanced data.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412838

Imbalanced classification data

Variational Autoencoder

Random Under-Sampling Boosting

Credit Card Fraud

Fraud

Mammography

Övrig annan samhällsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-412838
Date	January 2020
Creators	Ludvigsen, Jesper
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Handling Imbalanced Data Classification With Variational Autoencoding And Random Under-Sampling Boosting

Description

Links & Downloads

Tags

Additional Fields