Return to search

Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods

This study investigated the effectiveness of various analytical methods used for estimating R2 shrinkage in multiple regression analysis. Two categories of analytical formulae were identified: estimators of the population squared multiple correlation coefficient (ρ2), and estimators of the population cross-validity coefficient (ρc2). To avoid possible confounding factors that might be associated with a real data set such as data nonnormality, lack of precise population parameters, different degrees of multicollinearity among the predictor variables, and so forth, the Monte Carlo method was used to simulate multivariate normal sample data, with prespecified population parameters such as the squared multiple correlation coefficient (ρ2), number of predictors, different sample sizes, known degree of multicollinearity, and controlled data normality conditions. Five hundred replicates were simulated within each cell of the sampling conditions. Various analytical formulae were applied to the simulated data in each sampling condition, and the "adjusted" coefficients were obtained and then compared to their corresponding population parameters (ρ2 and ρc2).
Analysis of the results indicates that the currently most widely used (in both SAS and SPSS) "Wherry" formula is probably not the most effective analytical formula in estimating ρ2. Instead, the Pratt formula appeared to outperform other analytical formulae across most of these sampling conditions. Among the analytical formulae designed to estimate ρc2, the Browne formula appeared to be the most effective and stable in minimizing statistical bias across different sampling conditions. The study also concludes that it is the n/p (sample size/number of predictor variables) ratio that affects the performances of these analytical formulae the most; different degrees of multicollinearity among predictor variables do not have dramatic influence on the performances of these analytical formulae. Further replicants on both real and simulated data re still needed to investigate the effectiveness of these analytical formulae.

Identiferoai:union.ndltd.org:UTAHS/oai:digitalcommons.usu.edu:etd-7222
Date01 May 1999
CreatorsYin, Ping
PublisherDigitalCommons@USU
Source SetsUtah State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceAll Graduate Theses and Dissertations
RightsCopyright for this work is held by the author. Transmission or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. Works not in the public domain cannot be commercially exploited without permission of the copyright owner. Responsibility for any use rests exclusively with the user. For more information contact digitalcommons@usu.edu.

Page generated in 0.0022 seconds