Global ETD Search

Return to search

Does the removal of correlated variables affect the classification accuracy of machine learning algorithms?

The last decades have seen an increase in both the amount and complexity of the data used in modern industries in business and technology. A key element for managing these data sets is using machine learning algorithms to process structures and find patterns. Variable selection applies to facilitate and improve these processes by finding and removing redundant variables. One way to achieve this is by eliminating variables based on how much they correlate, a premise for this thesis. This study examines how a reduction of correlated variables affects the predictive accuracy of six different machine learning algorithms. Two demarcations are made. First, the correlation between the explanatory variables is set to a high level and secondly, each variable’s correlation with the dependent variable is set to a modest level. The hypothesis states that removing highly correlated explanatory variables should not negatively affect the accuracy. By conducting a Monte Carlo simulation with three models, each consisting of a different number of correlated variables, the change in accuracy could be compared and evaluated. The result suggests an adverse change in accuracy for all algorithms except one. The differences are relatively low, with the largest accuracy decrease being -5.49 percentage points. The conclusion is that the hypothesis does not hold when the explanatory variables are at a modest level of correlation with the dependent variable.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-466389

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-466389
Date	January 2021
Creators	Johansson Lannge, Elsa
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0023 seconds

Does the removal of correlated variables affect the classification accuracy of machine learning algorithms?

Description

Links & Downloads

Tags

Additional Fields