Spelling suggestions: "subject:"groupe variable selection""
1 |
Grouped variable selection in high dimensional partially linear additive Cox modelLiu, Li 01 December 2010 (has links)
In the analysis of survival outcome supplemented with both clinical information and high-dimensional gene expression data, traditional Cox proportional hazard model fails to meet some emerging needs in biological research. First, the number of covariates is generally much larger the sample size. Secondly, predicting an outcome with individual gene expressions is inadequate because a gene's expression is regulated by multiple biological processes and functional units. There is a need to understand the impact of changes at a higher level such as molecular function, cellular component, biological process, or pathway. The change at a higher level is usually measured with a set of gene expressions related to the biological process. That is, we need to model the outcome with gene sets as variable groups and the gene sets could be partially overlapped also.
In this thesis work, we investigate the impact of a penalized Cox regression procedure on regularization, parameter estimation, variable group selection, and nonparametric modeling of nonlinear eects with a time-to-event outcome.
We formulate the problem as a partially linear additive Cox model with high-dimensional data. We group genes into gene sets and approximate the nonparametric components by truncated series expansions with B-spline bases. After grouping and approximation, the problem of variable selection becomes that of selecting groups of coecients in a gene set or in an approximation. We apply the group Lasso to obtain an initial solution path and reduce the dimension of the problem and then update the whole solution path with the adaptive group Lasso. We also propose a generalized group lasso method to provide more freedom in specifying the penalty and excluding covariates from being penalized.
A modied Newton-Raphson method is designed for stable and rapid computation. The core programs are written in the C language. An user-friendly R interface is implemented to perform all the calculations by calling the core programs.
We demonstrate the asymptotic properties of the proposed methods. Simulation studies are carried out to evaluate the finite sample performance of the proposed procedure using several tuning parameter selection methods for choosing the point on the solution path as the nal estimator. We also apply the proposed approach on two real data examples.
|
2 |
Distributionally Robust Learning under the Wasserstein MetricChen, Ruidi 29 September 2019 (has links)
This dissertation develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. The learning problems that are studied include: (i) Distributionally Robust Linear Regression (DRLR), which estimates a robustified linear regression plane by minimizing the worst-case expected absolute loss over a probabilistic ambiguity set characterized by the Wasserstein metric; (ii) Groupwise Wasserstein Grouped LASSO (GWGL), which aims at inducing sparsity at a group level when there exists a predefined grouping structure for the predictors, through defining a specially structured Wasserstein metric for DRO; (iii) Optimal decision making using DRLR informed K-Nearest Neighbors (K-NN) estimation, which selects among a set of actions the optimal one through predicting the outcome under each action using K-NN with a distance metric weighted by the DRLR solution; and (iv) Distributionally Robust Multivariate Learning, which solves a DRO problem with a multi-dimensional response/label vector, as in Multivariate Linear Regression (MLR) and Multiclass Logistic Regression (MLG), generalizing the univariate response model addressed in DRLR. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining upper bounds on the prediction and estimation errors of the solution. The accuracy and robustness of the estimator is verified through a series of synthetic and real data experiments. The experiments with real data are all associated with various health informatics applications, an application area which motivated the work in this dissertation. In addition to estimation (regression and classification), this dissertation also considers outlier detection applications.
|
3 |
Statistical Applications of Linear Programming for Feature Selection via Regularization MethodsYao, Yonggang 01 October 2008 (has links)
No description available.
|
Page generated in 0.1018 seconds