When we try to understand why some schools perform worse than others, if Covid-19 has struck harder on some demographics or whether income correlates with increased happiness, we may turn to regression to better understand how these variables are correlated. To capture the true relationship between variables we may use variable selection methods in order to ensure that the variables which have an actual effect have been included in the model. Choosing the right model for variable selection is vital. Without it there is a risk of including variables which have little to do with the dependent variable or excluding variables that are important. Failing to capture the true effects would paint a picture disconnected from reality and it would also give a false impression of what reality really looks like. To mitigate this risk a simulation study has been conducted to find out what variable selection algorithms to apply in order to make more accurate inference. The different algorithms being tested are stepwise regression, backward elimination and lasso regression. Lasso performed worst when applied to a small sample but performed best when applied to larger samples. Backward elimination and stepwise regression had very similar results.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-446090 |
Date | January 2021 |
Creators | SINGH, KEVIN |
Publisher | Uppsala universitet, Statistiska institutionen |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0022 seconds