1 |
A study of machine learning performance in the prediction of juvenile diabetes from clinical test resultsPobi, Shibendra 01 June 2006 (has links)
Two approaches to building models for prediction of the onset of Type 1 diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Neural networks were compared with decision trees and ensembles of both types of classifiers. Support Vector Machines were also tested on this dataset. The highest known predictive accuracy was obtained when the data was encoded to explicitly indicate missing attributes in both cases. In the latter case, high accuracy was achieved without test results which, by themselves, could indicate diabetes. The effects of oversampling of minority class samples in the training set by generating synthetic examples were tested with ensemble techniques like bagging and random forests. It was observed, that oversampling of diabetic examples, lead to an increased accuracy in diabetic prediction demonstrated by a significantly better F-measure value. ROC curves and the statistical F-measure were used to compare the performance of the different machine learning algorithms.
|
2 |
A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Diabetes PredictionKola, Lokesh, Muriki, Vigneshwar January 2021 (has links)
Background: The main cause of diabetes is due to high sugar levels in the blood. There is no permanent cure for diabetes. However, it can be prevented by early diagnosis. In recent years, the hype for Machine Learning is increasing in disease prediction especially during COVID-19 times. In the present scenario, it is difficult for patients to visit doctors. A possible framework is provided using Machine Learning which can detect diabetes at early stages. Objectives: This thesis aims to identify the critical features that impact gestational (Type-3) diabetes and experiments are performed to identify the efficient algorithm for Type-3 diabetes prediction. The selected algorithms are Decision Trees, RandomForest, Support Vector Machine, Gaussian Naive Bayes, Bernoulli Naive Bayes, Laplacian Support Vector Machine. The algorithms are compared based on the performance. Methods: The method consists of gathering the dataset and preprocessing the data. SelectKBestunivariate feature selection was performed for selecting the important features, which influence the Type-3 diabetes prediction. A new dataset was created by binning some of the important features from the original dataset, leading to two datasets, non-binned and binned datasets. The original dataset was imbalanced due to the unequal distribution of class labels. The train-test split was performed on both datasets. Therefore, the oversampling technique was performed on both training datasets to overcome the imbalance nature. The selected Machine Learning algorithms were trained. Predictions were made on the test data. Hyperparameter tuning was performed on all algorithms to improve the performance. Predictions were made again on the test data and accuracy, precision, recall, and f1-score were measured on both binned and non-binned datasets. Results: Among selected Machine Learning algorithms, Laplacian Support Vector Machineattained higher performance with 89.61% and 86.93% on non-binned and binned datasets respectively. Hence, it is an efficient algorithm for Type-3 diabetes prediction. The second best algorithm is Random Forest with 74.5% and 72.72% on non-binned and binned datasets. The non-binned dataset performed well for the majority of selected algorithms. Conclusions: Laplacian Support Vector Machine scored high performance among the other algorithms on both binned and non-binned datasets. The non-binned dataset showed the best performance in almost all Machine Learning algorithms except Bernoulli naive Bayes. Therefore, the non-binned dataset is more suitable for the Type-3 diabetes prediction.
|
Page generated in 0.0903 seconds