Global ETD Search

Return to search

Best-subset model selection based on multitudinal assessments of likelihood improvements

Given a set of potential explanatory variables, one model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. A popular procedure that has been used in this setting is to select the model that results in the smallest value of the Akaike information criterion (AIC). One drawback in using the AIC is that it can lead to the frequent selection of overspecified models. This can be problematic if the researcher wishes to assert, with some level of certainty, the necessity of any given variable that has been selected.
This thesis develops a model selection procedure that allows the researcher to nominate, a priori, the probability at which overspecified models will be selected from among all possible subsets. The procedure seeks to determine if the inclusion of each candidate variable results in a sufficiently improved fitting term, and hence is referred to as the SIFT procedure. In order to determine whether there is sufficient evidence to retain a candidate variable or not, a set of threshold values are computed. Two procedures are proposed: a naive method based on a set of restrictive assumptions; and an empirical permutation-based method.
Graphical tools have also been developed to be used in conjunction with the SIFT procedure. The graphical representation of the SIFT procedure clarifies the process being undertaken. Using these tools can also assist researchers in developing a deeper understanding of the data they are analyzing.
The naive and empirical SIFT methods are investigated by way of simulation under a range of conditions within the standard linear model framework. The performance of the SIFT methodology is compared with model selection by minimum AIC; minimum Bayesian Information Criterion (BIC); and backward elimination based on p-values. The SIFT procedure is found to behave as designed—asymptotically selecting those variables that characterize the underlying data generating mechanism, while limiting the selection of false or spurious variables to the desired level.
The SIFT methodology offers researchers a promising new approach to model selection, whereby they are now able to control the probability of selecting an overspecified model to a level that best suits their needs.

AIC

BIC

Information Criterion

Identifer	oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-7204
Date	01 December 2013
Creators	Carter, Knute Derek
Contributors	Cavanaugh, Joseph E.
Publisher	University of Iowa
Source Sets	University of Iowa
Language	English
Detected Language	English
Type	dissertation
Format	application/pdf
Source	Theses and Dissertations
Rights	Copyright © 2013 Knute Derek Carter

Page generated in 0.0028 seconds

Best-subset model selection based on multitudinal assessments of likelihood improvements

Description

Links & Downloads

Tags

Additional Fields