Variable selection is an important topic in regression analysis and is intended to select the best subset of predictors. Least absolute shrinkage and selection operator (Lasso) was introduced by Tibshirani in 1996. This method can serve as a tool for variable selection because it shrinks some coefficients to exact zero by a constraint on the sum of absolute values of regression coefficients. For logistic regression, Lasso modifies the traditional parameter estimation method, maximum log likelihood, by adding the L1 norm of the parameters to the negative log likelihood function, so it turns a maximization problem into a minimization one. To solve this problem, we first need to give the value for the parameter of the L1 norm, called tuning parameter. Since the tuning parameter affects the coefficients estimation and variable selection, we want to find the optimal value for the tuning parameter to get the most accurate coefficient estimation and best subset of predictors in the L1 regularized regression model. There are two popular methods to select the optimal value of the tuning parameter that results in a best subset of predictors, Bayesian information criterion (BIC) and cross validation (CV). The objective of this paper is to evaluate and compare these two methods for selecting the optimal value of tuning parameter in terms of coefficients estimation accuracy and variable selection through simulation studies.
Identifer | oai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-3939 |
Date | 05 December 2012 |
Creators | Shi, Shujing |
Publisher | VCU Scholars Compass |
Source Sets | Virginia Commonwealth University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | © The Author |
Page generated in 0.0249 seconds