Global ETD Search

Return to search

Supervised Learning Techniques : A comparison of the Random Forest and the Support Vector Machine

This thesis examines the performance of the support vector machine and the random forest models in the context of binary classification. The two techniques are compared and the outstanding one is used to construct a final parsimonious model. The data set consists of 33 observations and 89 biomarkers as features with no known dependent variable. The dependent variable is generated through k-means clustering, with a predefined final solution of two clusters. The training of the algorithms is performed using five-fold cross-validation repeated twenty times. The outcome of the training process reveals that the best performing versions of the models are a linear support vector machine and a random forest with six randomly selected features at each split. The final results of the comparison on the test set of these optimally tuned algorithms show that the random forest outperforms the linear kernel support vector machine. The former classifies all observations in the test set correctly whilst the latter classifies all but one correctly. Hence, a parsimonious random forest model using the top five features is constructed, which, to conclude, performs equally well on the test set compared to the original random forest model using all features.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-274768

machine learning

biomarkers

cross-validation

receiver operating characteristic

k-means clustering

feature selection

binary classification

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-274768
Date	January 2016
Creators	Arnroth, Lukas, Fiddler Dennis, Jonni
Publisher	Uppsala universitet, Statistiska institutionen, Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

Supervised Learning Techniques : A comparison of the Random Forest and the Support Vector Machine

Description

Links & Downloads

Tags

Additional Fields