It is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally accepted procedure for determining the number of predictors to use. The various regression methods which do exist are logically contrived but are also largely based on subjective considerations. / The goal of this research is to develop and test a criterion that will indicate a priori the "optimum" number of predictors to use in a prediction equation. The mean square error statistic is used to evaluate the performance of a regression equation in both the dependent and independent samples. Selecting the "best" prediction equation consists of determining the equation with the minimum estimated independent sample mean square error. Several approximations and estimators of the independent sample mean square error which have appeared in the literature are discussed and two new estimators are derived. / These approximations and estimators are tested in Monte Carlo simulations to determine their skill in indicating the number of predictors which will yield the best prediction equation. The sample size, number of available predictors, correlations among the variables, distribution of the variables, and selection method are manipulated to explore how these various factors influence the performances of the mean square error estimators. It is found that the better estimators are capable of indicating a number of predictors to include in the regression equation for which the corresponding independent sample mean square error is near the minimum value. / As a practical test, the various estimators of the independent sample mean square error are applied to the data used in deriving the Model Output Statistics (MOS) maximum and minimum temperature forecast equations used by the National Weather Service. These prediction equations are linear regression equations derived using a forward selection method. The sequence of prediction equations corresponding to the forward trace of all the available predictors is derived for each of 192 cases and then applied to independent data. The forecasts made by the operational p = 10 predictor MOS equations are compared with those made by the equations determined by the estimators of the independent sample mean square error. The operational equations have the best overall verification statistics. The estimators persistently underestimate the values of the independent sample mean square error, but one of the new estimators is able to determine MOS forecast equations that perform as well as the operational equations. Furthermore, it is able to accomplish this without the use of an independent sample to help determine the optimum number of predictors. / Source: Dissertation Abstracts International, Volume: 41-05, Section: B, page: 1822. / Thesis (Ph.D.)--The Florida State University, 1980.
Identifer | oai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_74192 |
Contributors | CARR, MEG BRADY., Florida State University |
Source Sets | Florida State University |
Detected Language | English |
Type | Text |
Format | 288 p. |
Rights | On campus use only. |
Relation | Dissertation Abstracts International |
Page generated in 0.0015 seconds