• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification

Brandt, Jakob, Lanzén, Emil January 2021 (has links)
In this thesis, the performance of two over-sampling techniques, SMOTE and ADASYN, is compared. The comparison is done on three imbalanced data sets using three different classification models and evaluation metrics, while varying the way the data is pre-processed. The results show that both SMOTE and ADASYN improve the performance of the classifiers in most cases. It is also found that SVM in conjunction with SMOTE performs better than with ADASYN as the degree of class imbalance increases. Furthermore, both SMOTE and ADASYN increase the relative performance of the Random forest as the degree of class imbalance grows. However, no pre-processing method consistently outperforms the other in its contribution to better performance as the degree of class imbalance varies.
2

Neonatal Sepsis Detection With Random Forest Classification for Heavily Imbalanced Data

Osman Abubaker, Ayman January 2022 (has links)
Neonatal sepsis is associated with most cases ofmortality in the neonatal intensive care unit. Major challengesin detecting sepsis using suitable biomarkers has lead people tolook for alternative approaches in the form of Machine Learningtechniques. In this project, Random Forest classification wasperformed on a sepsis data set provided by Karolinska Hospital.We particularly focused on tackling class imbalance in the datausing sampling and cost-sensitive techniques. We compare theclassification performances of Random Forests in six differentsetups; four using oversampling and undersampling techniques;one using cost-sensitive learning and one basic Random Forest.The performance with the oversampling techniques were betterand could identify more sepsis patients than the other setups.The overall performances were also good, making the methodspotentially useful in practice. / Neonatal sepsis är orsaken till majoriteten av mortaliteten i neonatal intensivvården. Svårigheten i att detektera sepsis med hjälp av biomarkörer har lett många att leta efter alternativa metoder. Maskininlärningstekniker är en sådan alternativ metod som har i senaste tider ökat i användning inom vård och andra sektorer. I detta project användes Random Forest klassifikations algoritmen på en sepsis datamängd given av Karolinska Sjukhuset. Vi fokuserade på att hantera klassimbalansen i datan genom att använda olika provtagningsoch kostnadskänsliga metoder. Vi jämförde klassificeringsprestanda för Random Forest med sex olika inställningar; fyra av de använde provtagingsmetoderna; en av de använde en kostnadskänslig metod och en var en vanlig Random Forest. Det visade sig att modellens prestanda ökade som mest med översamplings metoderna. Den generella klassificeringsprestandan var också bra, vilket gör Random Forests tillsammans med ingsmetoderna potentiellt användbar i praktiken. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm

Page generated in 0.0199 seconds