1 |
Contributions to statistical learning and its applications in personalized medicineValencia Arboleda, Carlos Felipe 16 May 2013 (has links)
This dissertation, in general, is about finding stable solutions to statistical models with very large number of parameters and to analyze their asymptotic statistical properties. In particular, it is centered in the study of regularization methods based on penalized estimation. Those procedures find an estimator that is the result of an optimization problem balancing out the fitting to the data with the plausability of the estimation. The first chapter studies a smoothness regularization estimator for an infinite dimensional parameter in an exponential family model with functional predictors. We focused on the Reproducing Kernel Hilbert space approach and show that regardless the generality of the method, minimax optimal convergence rates are achieved. In order to derive the asymptotic analysis of the estimator, we developed a simultaneous diagonalization tool for two positive definite operators: the kernel operator and the operator defined by the second Frechet derivative of the expected data t functional. By using the proposed simultaneous diagonalization tool sharper bounds on the minimax rates are obtained. The second chapter studies the statistical properties of the method of regularization using Radial Basis Functions in the context of linear inverse problems. The regularization here serves two purposes, one is creating a stable solution for the inverse problem and the other is prevent the over-fitting on the nonparametric estimation of the functional target. Different degrees for the ill-posedness in the inversion of the operator A are considered: mildly and severely ill-posed. Also, we study different types for radial basis kernels classifieded by the strength of the penalization norm: Gaussian, Multiquadrics and Spline type of kernels. The third chapter deals with the problem of Individualized Treatment Rule (ITR) and analyzes the solution of it through Discriminant Analysis. In the ITR problem, the treatment assignment is done based on the particular patient's prognosis covariates in order to maximizes some reward function. Data generated from a random clinical trial is considered. Maximizing the empirical value function is an NP-hard computational problem. We consider estimating directly the decision rule by maximizing the expected value, using a surrogate function in order to make the optimization problem computationally feasible (convex programming). Necessary and sufficient conditions for Infinite Sample Consistency on the surrogate function are found for different scenarios: binary treatment selection, treatment selection with withholding and multi-treatment selection.
|
2 |
A Concave Pairwise Fusion Approach to Clustering of Multi-Response Regression and Its Robust ExtensionsChen, Chen, 0000-0003-1175-3027 January 2022 (has links)
Solution-path convex clustering is combined with concave penalties by Ma and Huang (2017) to reduce clustering bias. Their method was introduced in the setting of single-response regression to handle heterogeneity. Such heterogeneity may come from either the regression intercepts or the regression slopes. The procedure, realized by the alternating direction method of multipliers (ADMM) algorithm, can simultaneously identify the grouping structure of observations and estimate regression coefficients.
In the first part of our work, we extend this procedure to multi-response regression. We propose models to solve cases with heterogeneity in either the regression intercepts or the regression slopes. We combine the existing gadgets of the ADMM algorithm and group-wise concave penalties to find solutions for the model. Our work improves model performance in both clustering accuracy and estimation accuracy. We also demonstrate the necessity of such extension through the fact that by utilizing information in multi-dimensional space, the performance can be greatly improved.
In the second part, we introduce robust solutions to our proposed work. We introduce two approaches to handle outliers or long-tail distributions. The first is to replace the squared loss with robust loss, among which are absolute loss and Huber loss. The second is to characterize and remove outliers' effects by a mean-shift vector. We demonstrate that these robust solutions outperform the squared loss based method when outliers are present, or the underlying distribution is long-tailed. / Statistics
|
3 |
Exponential weighted aggregation : oracle inequalities and algorithms / Agrégation à poids exponentiels : inégalités oracles et algorithmesLuu, Duy tung 23 November 2017 (has links)
Dans plusieurs domaines des statistiques, y compris le traitement du signal et des images, l'estimation en grande dimension est une tâche importante pour recouvrer un objet d'intérêt. Toutefois, dans la grande majorité de situations, ce problème est mal-posé. Cependant, bien que la dimension ambiante de l'objet à restaurer (signal, image, vidéo) est très grande, sa ``complexité'' intrinsèque est généralement petite. La prise en compte de cette information a priori peut se faire au travers de deux approches: (i) la pénalisation (très populaire) et (ii) l'agrégation à poids exponentiels (EWA). L'approche penalisée vise à chercher un estimateur qui minimise une attache aux données pénalisée par un terme promouvant des objets de faible complexité (simples). L'EWA combine une famille des pré-estimateurs, chacun associé à un poids favorisant exponentiellement des pré-estimateurs, lesquels privilègent les mêmes objets de faible complexité.Ce manuscrit se divise en deux grandes parties: une partie théorique et une partie algorithmique. Dans la partie théorique, on propose l'EWA avec une nouvelle famille d'a priori favorisant les signaux parcimonieux à l'analyse par group dont la performance est garantie par des inégalités oracle. Ensuite, on analysera l'estimateur pénalisé et EWA, avec des a prioris généraux favorisant des objets simples, dans un cardre unifié pour établir des garanties théoriques. Deux types de garanties seront montrés: (i) inégalités oracle en prédiction, et (ii) bornes en estimation. On les déclinera ensuite pour des cas particuliers dont certains ont été étudiés dans littérature. Quant à la partie algorithmique, on y proposera une implémentation de ces estimateurs en alliant simulation Monte-Carlo (processus de diffusion de Langevin) et algorithmes d'éclatement proximaux, et montrera leurs garanties de convergence. Plusieurs expériences numériques seront décrites pour illustrer nos garanties théoriques et nos algorithmes. / In many areas of statistics, including signal and image processing, high-dimensional estimation is an important task to recover an object of interest. However, in the overwhelming majority of cases, the recovery problem is ill-posed. Fortunately, even if the ambient dimension of the object to be restored (signal, image, video) is very large, its intrinsic ``complexity'' is generally small. The introduction of this prior information can be done through two approaches: (i) penalization (very popular) and (ii) aggregation by exponential weighting (EWA). The penalized approach aims at finding an estimator that minimizes a data loss function penalized by a term promoting objects of low (simple) complexity. The EWA combines a family of pre-estimators, each associated with a weight exponentially promoting the same objects of low complexity.This manuscript consists of two parts: a theoretical part and an algorithmic part. In the theoretical part, we first propose the EWA with a new family of priors promoting analysis-group sparse signals whose performance is guaranteed by oracle inequalities. Next, we will analysis the penalized estimator and EWA, with a general prior promoting simple objects, in a unified framework for establishing some theoretical guarantees. Two types of guarantees will be established: (i) prediction oracle inequalities, and (ii) estimation bounds. We will exemplify them for particular cases some of which studied in the literature. In the algorithmic part, we will propose an implementation of these estimators by combining Monte-Carlo simulation (Langevin diffusion process) and proximal splitting algorithms, and show their guarantees of convergence. Several numerical experiments will be considered for illustrating our theoretical guarantees and our algorithms.
|
Page generated in 0.1033 seconds