Spelling suggestions: "subject:"matthews correlation coefficient"" "subject:"matthew's correlation coefficient""
1 |
A Comparative Review of SMOTE and ADASYN in Imbalanced Data ClassificationBrandt, Jakob, Lanzén, Emil January 2021 (has links)
In this thesis, the performance of two over-sampling techniques, SMOTE and ADASYN, is compared. The comparison is done on three imbalanced data sets using three different classification models and evaluation metrics, while varying the way the data is pre-processed. The results show that both SMOTE and ADASYN improve the performance of the classifiers in most cases. It is also found that SVM in conjunction with SMOTE performs better than with ADASYN as the degree of class imbalance increases. Furthermore, both SMOTE and ADASYN increase the relative performance of the Random forest as the degree of class imbalance grows. However, no pre-processing method consistently outperforms the other in its contribution to better performance as the degree of class imbalance varies.
|
2 |
Optimising Machine Learning Models for Imbalanced Swedish Text Financial Datasets: A Study on Receipt Classification : Exploring Balancing Methods, Naive Bayes Algorithms, and Performance TradeoffsHu, Li Ang, Ma, Long January 2023 (has links)
This thesis investigates imbalanced Swedish text financial datasets, specifically receipt classification using machine learning models. The study explores the effectiveness of under-sampling and over-sampling methods for Naive Bayes algorithms, collaborating with Fortnox for a controlled experiment. Evaluation metrics compare balancing methods regarding the accuracy, Matthews's correlation coefficient (MCC) , F1 score, precision, and recall. Findings contribute to Swedish text classification, providing insights into balancing methods. The thesis report examines balancing methods and parameter tuning on machine learning models for imbalanced datasets. Multinomial Naive Bayes (MultiNB) algorithms in Natural language processing (NLP) are studied, with potential application in image classification for assessing industrial thin component deformation. Experiments show balancing methods significantly affect MCC and recall, with a recall-MCC-accuracy tradeoff. Smaller alpha values generally improve accuracy. Synthetic Minority Oversampling Technique (SMOTE) and Tomek's algorithm for removing links developed in 1976 by Ivan Tomek. First Tomek, then SMOTE (TomekSMOTE) yield promising accuracy improvements. Due to time constraints, Over-sampling using SMOTE and cleaning using Tomek links. First SMOTE, then Tomek (SMOTETomek) training is incomplete. This thesis report finds the best MCC is achieved when $\alpha$ is 0.01 on imbalanced datasets.
|
3 |
Moderní řečové příznaky používané při diagnóze chorob / State of the art speech features used during the Parkinson disease diagnosisBílý, Ondřej January 2011 (has links)
This work deals with the diagnosis of Parkinson's disease by analyzing the speech signal. At the beginning of this work there is described speech signal production. The following is a description of the speech signal analysis, its preparation and subsequent feature extraction. Next there is described Parkinson's disease and change of the speech signal by this disability. The following describes the symptoms, which are used for the diagnosis of Parkinson's disease (FCR, VSA, VOT, etc.). Another part of the work deals with the selection and reduction symptoms using the learning algorithms (SVM, ANN, k-NN) and their subsequent evaluation. In the last part of the thesis is described a program to count symptoms. Further is described selection and the end evaluated all the result.
|
Page generated in 0.1439 seconds