Return to search

Best-subset model selection based on multitudinal assessments of likelihood improvements

Given a set of potential explanatory variables, one model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. A popular procedure that has been used in this setting is to select the model that results in the smallest value of the Akaike information criterion (AIC). One drawback in using the AIC is that it can lead to the frequent selection of overspecified models. This can be problematic if the researcher wishes to assert, with some level of certainty, the necessity of any given variable that has been selected.
This thesis develops a model selection procedure that allows the researcher to nominate, a priori, the probability at which overspecified models will be selected from among all possible subsets. The procedure seeks to determine if the inclusion of each candidate variable results in a sufficiently improved fitting term, and hence is referred to as the SIFT procedure. In order to determine whether there is sufficient evidence to retain a candidate variable or not, a set of threshold values are computed. Two procedures are proposed: a naive method based on a set of restrictive assumptions; and an empirical permutation-based method.
Graphical tools have also been developed to be used in conjunction with the SIFT procedure. The graphical representation of the SIFT procedure clarifies the process being undertaken. Using these tools can also assist researchers in developing a deeper understanding of the data they are analyzing.
The naive and empirical SIFT methods are investigated by way of simulation under a range of conditions within the standard linear model framework. The performance of the SIFT methodology is compared with model selection by minimum AIC; minimum Bayesian Information Criterion (BIC); and backward elimination based on p-values. The SIFT procedure is found to behave as designed—asymptotically selecting those variables that characterize the underlying data generating mechanism, while limiting the selection of false or spurious variables to the desired level.
The SIFT methodology offers researchers a promising new approach to model selection, whereby they are now able to control the probability of selecting an overspecified model to a level that best suits their needs.

Identiferoai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-7204
Date01 December 2013
CreatorsCarter, Knute Derek
ContributorsCavanaugh, Joseph E.
PublisherUniversity of Iowa
Source SetsUniversity of Iowa
LanguageEnglish
Detected LanguageEnglish
Typedissertation
Formatapplication/pdf
SourceTheses and Dissertations
RightsCopyright © 2013 Knute Derek Carter

Page generated in 0.0028 seconds