Return to search

BAGGED PREDICTION ACCURACY IN LINEAR REGRESSION

Bootstrap aggregation, or bagging, is a prominent method used in statistical inquiry suggested to improve predictive performance. It is useful to confirm the efficacy of such improvements and to expand upon them. This thesis investigates whether the results of Leo Breiman's (1996) paper \emph{Bagging predictors} can be replicated, where bagging is shown to lower prediction error. Additionally, predictive performance of weighted bagging is investigated, where we weight using a function of the residual variance. The data used is simulated, consisting of a numerical outcome variable as well as 30 independent variables. Linear regression is run with forward step selection, selecting models with the lowest SSE. Predictions are saved for all 30 models. Separately, we run forward step selection, selecting significant p-values of the added coefficient, saving only one final model. Prediction error is measured in mean squared error. The results suggest that both bagged methods improve upon prediction error, selecting models with the lowest SSE, with unweighted bagging performing the best. The results are congruent with Breiman's (1996) results, with minor differences. P-value selection shows weighted bagging performing the best. Further research should be conducted on real data to verify these results, in particular with reference to weighted bagging.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-477124
Date January 2022
CreatorsKimby, Daniel
PublisherUppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds