Return to search

Comparing Variable Selection Algorithms On Logistic Regression – A Simulation

When we try to understand why some schools perform worse than others, if Covid-19 has struck harder on some demographics or whether income correlates with increased happiness, we may turn to regression to better understand how these variables are correlated. To capture the true relationship between variables we may use variable selection methods in order to ensure that the variables which have an actual effect have been included in the model. Choosing the right model for variable selection is vital. Without it there is a risk of including variables which have little to do with the dependent variable or excluding variables that are important. Failing to capture the true effects would paint a picture disconnected from reality and it would also give a false impression of what reality really looks like. To mitigate this risk a simulation study has been conducted to find out what variable selection algorithms to apply in order to make more accurate inference. The different algorithms being tested are stepwise regression, backward elimination and lasso regression. Lasso performed worst when applied to a small sample but performed best when applied to larger samples. Backward elimination and stepwise regression had very similar results.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-446090
Date January 2021
CreatorsSINGH, KEVIN
PublisherUppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds