Return to search

Statistical properties of forward selection regression estimators

In practice, when one has many candidate variables as explanatory variables in multiple regression, there is always the possibility that variables that are important determinants of the response variable might be omitted from the model, while unimportant variables might be included. Both types of errors are important, and in this dissertation it is attempted to quantify the probabilities of these errors. A simulation study is reported in this dissertation. Different numbers of variables, i.e. p= 4 to 20 are assumed, and different sample sizes, i.e. n=0.5p, p, 2p, 4p. For each p the underlying model assumes that roughly half of the independent variables are actually correlated with the dependant variable and the other half not. The noise is ε~ N(0, σ2, where σ2, is set fixed. The data was simulated 10000 times for each combination of n and p using known underlying models and ε randomly selected from of a normal distribution. For this investigation the full model and forward selection regression are compared. The mean squared error of the estimated coefficient β(p) is determined from the true β of each n and p set. A full discussion, as well as graphs, is presented. / Dissertation (MSc)--University of Pretoria, 2011. / Statistics / unrestricted

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:up/oai:repository.up.ac.za:2263/27014
Date04 August 2011
CreatorsThiebaut, Nicolene Magrietha
ContributorsSteffens, Francois E., nicolene.thiebaut@gmail.com
PublisherUniversity of Pretoria
Source SetsSouth African National ETD Portal
Detected LanguageEnglish
TypeDissertation
Rights© 2011 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.

Page generated in 0.0027 seconds