Spelling suggestions: "subject:"penalized regression"" "subject:"menalized regression""
1 |
A numerical study of penalized regressionYu, Han 22 August 2013 (has links)
In this thesis, we review important aspects and issues of multiple linear regression, in particular on the problem of multi-collinearity.
The focus is on a numerical study of different methods of penalized regression, including the ridge regression, lasso regression and elastic net regression, as well as the newly introduced correlation adjusted regression and correlation adjusted elastic net regression. We compare the performance and relative advantages of these methods.
|
2 |
A numerical study of penalized regressionYu, Han 22 August 2013 (has links)
In this thesis, we review important aspects and issues of multiple linear regression, in particular on the problem of multi-collinearity.
The focus is on a numerical study of different methods of penalized regression, including the ridge regression, lasso regression and elastic net regression, as well as the newly introduced correlation adjusted regression and correlation adjusted elastic net regression. We compare the performance and relative advantages of these methods.
|
3 |
Penalized methods in genome-wide association studiesLiu, Jin 01 July 2011 (has links)
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as the LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient and stable in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP (SMCP) and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with a LASSO approach are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using data from a GWAS on rheumatoid arthritis. Based on the idea of SMCP, we propose a new penalized method for group variable selection in GWAS with respect to the correlation between adjacent groups. The proposed method uses the group LASSO for encouraging group sparsity and a quadratic difference for adjacent group smoothing. We call it smoothed group LASSO, or SGL for short. Canonical correlations between two adjacent groups of SNPS are used as the weights in the quadratic difference penalty. Principal components are used to reduced dimensionality locally within groups. We derive a group coordinate descent algorithm for computing the solution path of the SGL. Simulation studies are used to evaluate the finite sample performance of the SGL and group LASSO. We also demonstrate its applicability on rheumatoid arthritis data.
|
4 |
penalized: A MATLAB toolbox for fitting generalized linear models with penaltiesMcIlhagga, William H. 07 August 2015 (has links)
Yes / penalized is a
exible, extensible, and e cient MATLAB toolbox for penalized maximum
likelihood. penalized allows you to t a generalized linear model (gaussian, logistic,
poisson, or multinomial) using any of ten provided penalties, or none. The toolbox can
be extended by creating new maximum likelihood models or new penalties. The toolbox
also includes routines for cross-validation and plotting.
|
5 |
Robust Approaches to Marker Identification and Evaluation for Risk AssessmentDai, Wei January 2013 (has links)
Assessment of risk has been a key element in efforts to identify factors associated with disease, to assess potential targets of therapy and enhance disease prevention and treatment. Considerable work has been done to develop methods to identify markers, construct risk prediction models and evaluate such models. This dissertation aims to develop robust approaches for these tasks. In Chapter 1, we present a robust, flexible yet powerful approach to identify genetic variants that are associated with disease risk in genome-wide association studies when some subjects are related. In Chapter 2, we focus on identifying important genes predictive of survival outcome when the number of covariates greatly exceeds the number of observations via a nonparametric transformation model. We propose a rank-based estimator that poses minimal assumptions and develop an efficient
|
6 |
Regularized Markov Model for Modeling Disease TransitioningHuang, Shuang, Huang, Shuang January 2017 (has links)
In longitudinal studies of chronic diseases, the disease states of individuals are often collected at several pre-scheduled clinical visits, but the exact states and the times of transitioning from one state to another between observations are not observed. This is commonly referred to as "panel data". Statistical challenges arise in panel data in regard to identifying predictors governing the transitions between different disease states with only the partially observed disease history. Continuous-time Markov models (CTMMs) are commonly used to analyze panel data, and allow maximum likelihood estimations without making any assumptions about the unobserved states and transition times. By assuming that the underlying disease process is Markovian, CTMMs yield tractable likelihood. However, CTMMs generally allow covariate effect to differ for different transitions, resulting in a much higher number of coefficients to be estimated than the number of covariates, and model overfitting can easily happen in practice. In three papers, I develop a regularized CTMM using the elastic net penalty for panel data, and implement it in an R package. The proposed method is capable of simultaneous variable selection and estimation even when the dimension of the covariates is high.
In the first paper (Section 2), I use elastic net penalty to regularize the CTMM, and derive an efficient coordinate descent algorithm to solve the corresponding optimization problem. The algorithm takes advantage of the multinomial state distribution under the non-informative observation scheme assumption to simplify computation of key quantities. Simulation study shows that this method can effectively select true non-zero predictors while reducing model size.
In the second paper (Section 3), I extend the regularized CTMM developed in the previous paper to accommodate exact death times and censored states. Death is commonly included as an endpoint in longitudinal studies, and exact time of death can be easily obtained but the state path leading to death is usually unknown. I show that exact death times result in a very different form of likelihood, and the dependency of death time on the model requires significantly different numerical methods for computing the derivatives of the log likelihood, a key quantity for the coordinate descent algorithm. I propose to use numerical differentiation to compute the derivatives of the log likelihood. Computation of the derivatives of the log likelihood from a transition involving a censored state is also discussed. I carry out a simulation study to evaluate the performance of this extension, which shows consistently good variable selection properties and comparable prediction accuracy compared to the oracle models where only true non-zero coefficient are fitted. I then apply the regularized CTMM to the airflow limitation data to the TESAOD (The Tucson Epidemiological Study of Airway Obstructive Disease) study with exact death times and censored states, and obtain a prediction model with great size reduction from a total of 220 potential parameters.
Methods developed in the first two papers are implemented in an R package markovnet, and a detailed introduction to the key functionalities of the package is demonstrated with a simulated data set in the third paper (Section 4). Finally, some conclusion remarks are given and directions to future work are discussed (Section 5).
The outline for this dissertation is as follows. Section 1 presents an in-depth background regarding panel data, CTMMs, and penalized regression methods, as well as an brief description of the TESAOD study design. Section 2 describes the first paper entitled "Regularized continuous-time Markov model via elastic net'". Section 3 describes the second paper entitled "Regularized continuous-time Markov model with exact death times and censored states"'. Section 4 describes the third paper "Regularized continuous-time Markov model for panel data: the markovnet package for R"'. Section 5 gives an overall summary and a discussion of future work.
|
7 |
Consistent bi-level variable selection via composite group bridge penalized regressionSeetharaman, Indu January 1900 (has links)
Master of Science / Department of Statistics / Kun Chen / We study the composite group bridge penalized regression methods for conducting bilevel variable selection in high dimensional linear regression models with a diverging number of predictors. The proposed method combines the ideas of bridge regression (Huang et al., 2008a) and group bridge regression (Huang et al., 2009), to achieve variable selection consistency
in both individual and group levels simultaneously, i.e., the important groups and
the important individual variables within each group can both be correctly identi ed with
probability approaching to one as the sample size increases to in nity. The method takes full advantage of the prior grouping information, and the established bi-level oracle properties ensure that the method is immune to possible group misidenti cation. A related adaptive group bridge estimator, which uses adaptive penalization for improving bi-level selection, is also investigated. Simulation studies show that the proposed methods have superior performance in comparison to many existing methods.
|
8 |
Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancerZhai, Jing, Hsu, Chiu-Hsieh, Daye, Z. John 25 January 2017 (has links)
Background: Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates. Methods: In this article, we propose the ridge-lasso hybrid estimator (ridle), a new penalized regression method that simultaneously estimates coefficients of mandatory covariates while allowing selection for others. The ridle provides a principled approach to mitigate effects of multicollinearity among the mandatory covariates and possible dependency between mandatory and optional variables. We provide detailed empirical and theoretical studies to evaluate our method. In addition, we develop an efficient algorithm for the ridle. Software, based on efficient Fortran code with R-language wrappers, is publicly and freely available at https://sites.google.com/site/zhongyindaye/software. Results: The ridle is useful when mandatory predictors are known to be significant due to prior knowledge or must be kept for additional analysis. Both theoretical and comprehensive simulation studies have shown that the ridle to be advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. A microarray gene expression analysis of the histologic grades of breast cancer has identified 24 genes, in which 2 genes are selected only by the ridle among current methods and found to be associated with tumor grade. Conclusions: In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Results suggest that the ridle is advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves.
|
9 |
Regularized methods for high-dimensional and bi-level variable selectionBreheny, Patrick John 01 July 2009 (has links)
Many traditional approaches cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP.
The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems.
In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. The Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals. I use these hypotheses to connect the previously disparate elds of multiple comparisons and penalized regression, develop estimators for the false discovery rates of methods such as the lasso and elastic net, and establish theoretical results.
Finally, the methods from all three sections are studied in a number of simulations and applied to real data from gene expression and genetic association studies.
|
10 |
Grouped variable selection in high dimensional partially linear additive Cox modelLiu, Li 01 December 2010 (has links)
In the analysis of survival outcome supplemented with both clinical information and high-dimensional gene expression data, traditional Cox proportional hazard model fails to meet some emerging needs in biological research. First, the number of covariates is generally much larger the sample size. Secondly, predicting an outcome with individual gene expressions is inadequate because a gene's expression is regulated by multiple biological processes and functional units. There is a need to understand the impact of changes at a higher level such as molecular function, cellular component, biological process, or pathway. The change at a higher level is usually measured with a set of gene expressions related to the biological process. That is, we need to model the outcome with gene sets as variable groups and the gene sets could be partially overlapped also.
In this thesis work, we investigate the impact of a penalized Cox regression procedure on regularization, parameter estimation, variable group selection, and nonparametric modeling of nonlinear eects with a time-to-event outcome.
We formulate the problem as a partially linear additive Cox model with high-dimensional data. We group genes into gene sets and approximate the nonparametric components by truncated series expansions with B-spline bases. After grouping and approximation, the problem of variable selection becomes that of selecting groups of coecients in a gene set or in an approximation. We apply the group Lasso to obtain an initial solution path and reduce the dimension of the problem and then update the whole solution path with the adaptive group Lasso. We also propose a generalized group lasso method to provide more freedom in specifying the penalty and excluding covariates from being penalized.
A modied Newton-Raphson method is designed for stable and rapid computation. The core programs are written in the C language. An user-friendly R interface is implemented to perform all the calculations by calling the core programs.
We demonstrate the asymptotic properties of the proposed methods. Simulation studies are carried out to evaluate the finite sample performance of the proposed procedure using several tuning parameter selection methods for choosing the point on the solution path as the nal estimator. We also apply the proposed approach on two real data examples.
|
Page generated in 0.1038 seconds