Global ETD Search

1	Consistent bi-level variable selection via composite group bridge penalized regression Seetharaman, Indu January 1900 (has links) Master of Science / Department of Statistics / Kun Chen / We study the composite group bridge penalized regression methods for conducting bilevel variable selection in high dimensional linear regression models with a diverging number of predictors. The proposed method combines the ideas of bridge regression (Huang et al., 2008a) and group bridge regression (Huang et al., 2009), to achieve variable selection consistency in both individual and group levels simultaneously, i.e., the important groups and the important individual variables within each group can both be correctly identi ed with probability approaching to one as the sample size increases to in nity. The method takes full advantage of the prior grouping information, and the established bi-level oracle properties ensure that the method is immune to possible group misidenti cation. A related adaptive group bridge estimator, which uses adaptive penalization for improving bi-level selection, is also investigated. Simulation studies show that the proposed methods have superior performance in comparison to many existing methods. Bi-level variable selection High-dimensional data Oracle property Penalized regression Sparse models Statistics (0463)
2	Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function VanDerwerken, Douglas Nielsen 10 March 2011 (has links) (PDF) L0 penalized likelihood procedures like Mallows' Cp, AIC, and BIC directly penalize for the number of variables included in a regression model. This is a straightforward approach to the problem of overfitting, and these methods are now part of every statistician's repertoire. However, these procedures have been shown to sometimes result in unstable parameter estimates as a result on the L0 penalty's discontinuity at zero. One proposed alternative, seamless-L0 (SELO), utilizes a continuous penalty function that mimics L0 and allows for stable estimates. Like other similar methods (e.g. LASSO and SCAD), SELO produces sparse solutions because the penalty function is non-differentiable at the origin. Because these penalized likelihoods are singular (non-differentiable) at zero, there is no closed-form solution for the extremum of the objective function. We propose a continuous and everywhere-differentiable penalty function that can have arbitrarily steep slope in a neighborhood near zero, thus mimicking the L0 penalty, but allowing for a nearly closed-form solution for the beta-hat vector. Because our function is not singular at zero, beta-hat will have no zero-valued components, although some will have been shrunk arbitrarily close thereto. We employ a BIC-selected tuning parameter used in the shrinkage step to perform zero-thresholding as well. We call the resulting vector of coefficients the ShrinkSet estimator. It is comparable to SELO in terms of model performance (selecting the truly nonzero coefficients, overall MSE, etc.), but we believe it to be more intuitive and simpler to compute. We provide strong evidence that the estimator enjoys favorable asymptotic properties, including the oracle property. Penalized likelihood variable selection oracle property large p small n Statistics and Probability
3	Regularization for High-dimensional Time Series Models Sun, Yan 20 September 2011 (has links) No description available. Statistics conditional likelihood dimension reduction oracle property sparse stationary time series
4	Regularisation and variable selection using penalized likelihood. El anbari, Mohammed 14 December 2011 (has links) (PDF) We are interested in variable sélection in linear régression models. This research is motivated by recent development in microarrays, proteomics, brain images, among others. We study this problem in both frequentist and bayesian viewpoints.In a frequentist framework, we propose methods to deal with the problem of variable sélection, when the number of variables is much larger than the sample size with a possibly présence of additional structure in the predictor variables, such as high corrélations or order between successive variables. The performance of the proposed methods is theoretically investigated ; we prove that, under regularity conditions, the proposed estimators possess statistical good properties, such as Sparsity Oracle Inequalities, variable sélection consistency and asymptotic normality.In a Bayesian Framework, we propose a global noninformative approach for Bayesian variable sélection. In this thesis, we pay spécial attention to two calibration-free hierarchical Zellner's g-priors. The first one is the Jeffreys prior which is not location invariant. A second one avoids this problem by only considering models with at least one variable in the model. The practical performance of the proposed methods is illustrated through numerical experiments on simulated and real world datasets, with a comparison betwenn Bayesian and frequentist approaches under a low informative constraint when the number of variables is almost equal to the number of observations. Dimensionality réduction High dimensionality LASSO Scad Elastic-net Model selection Oracle property Zellner's g-prior Calibration
5	Approches nouvelles des modèles GARCH multivariés en grande dimension / New approaches for high-dimensional multivariate GARCH models Poignard, Benjamin 15 June 2017 (has links) Ce document traite du problème de la grande dimension dans des processus GARCH multivariés. L'auteur propose une nouvelle dynamique vine-GARCH pour des processus de corrélation paramétrisés par un graphe non dirigé appelé "vine". Cette approche génère directement des matrices définies-positives et encourage la parcimonie. Après avoir établi des résultats d'existence et d'unicité pour les solutions stationnaires du modèle vine-GARCH, l'auteur analyse les propriétés asymptotiques du modèle. Il propose ensuite un cadre général de M-estimateurs pénalisés pour des processus dépendants et se concentre sur les propriétés asymptotiques de l'estimateur "adaptive Sparse Group Lasso". La grande dimension est traitée en considérant le cas où le nombre de paramètres diverge avec la taille de l'échantillon. Les résultats asymptotiques sont illustrés par des expériences simulées. Enfin dans ce cadre l'auteur propose de générer la sparsité pour des dynamiques de matrices de variance covariance. Pour ce faire, la classe des modèles ARCH multivariés est utilisée et les processus correspondants à celle-ci sont estimés par moindres carrés ordinaires pénalisés. / This document contributes to high-dimensional statistics for multivariate GARCH processes. First, the author proposes a new dynamic called vine-GARCH for correlation processes parameterized by an undirected graph called vine. The proposed approach directly specifies positive definite matrices and fosters parsimony. The author provides results for the existence and uniqueness of stationary solution of the vine-GARCH model and studies its asymptotic properties. He then proposes a general framework for penalized M-estimators with dependent processes and focuses on the asymptotic properties of the adaptive Sparse Group Lasso regularizer. The high-dimensionality setting is studied when considering a diverging number of parameters with the sample size. The asymptotic properties are illustrated through simulation experiments. Finally, the author proposes to foster sparsity for multivariate variance covariance matrix processes within the latter framework. To do so, the multivariate ARCH family is considered and the corresponding parameterizations are estimated thanks to penalized ordinary least square procedures. Corrélations partielles Estimateur du QMV M-Estimateurs pénalisés Propriété oracle Stationnarité Vine régulière Oracle property Partial correlations Penalized M-Estimators QML estimator Regular vine Stationarity 519.5
6	Model Selection and Adaptive Lasso Estimation of Spatial Models Liu, Tuo 07 December 2017 (has links) No description available. Economics likelihood ratio near-epoch dependence spatial autoregressive model adaptive lasso oracle property least square approximation selection consistency
7	Regularisation and variable selection using penalized likelihood / Régularisation et sélection de variables par le biais de la vraisemblance pénalisée El anbari, Mohammed 14 December 2011 (has links) Dans cette thèse nous nous intéressons aux problèmes de la sélection de variables en régression linéaire. Ces travaux sont en particulier motivés par les développements récents en génomique, protéomique, imagerie biomédicale, traitement de signal, traitement d’image, en marketing, etc… Nous regardons ce problème selon les deux points de vue fréquentielle et bayésienne.Dans un cadre fréquentiel, nous proposons des méthodes pour faire face au problème de la sélection de variables, dans des situations pour lesquelles le nombre de variables peut être beaucoup plus grand que la taille de l’échantillon, avec présence possible d’une structure supplémentaire entre les variables, telle qu’une forte corrélation ou un certain ordre entre les variables successives. Les performances théoriques sont explorées ; nous montrons que sous certaines conditions de régularité, les méthodes proposées possèdent de bonnes propriétés statistiques, telles que des inégalités de parcimonie, la consistance au niveau de la sélection de variables et la normalité asymptotique.Dans un cadre bayésien, nous proposons une approche globale de la sélection de variables en régression construite sur les lois à priori g de Zellner dans une approche similaire mais non identique à celle de Liang et al. (2008) Notre choix ne nécessite aucune calibration. Nous comparons les approches de régularisation bayésienne et fréquentielle dans un contexte peu informatif où le nombre de variables est presque égal à la taille de l’échantillon. / We are interested in variable sélection in linear régression models. This research is motivated by recent development in microarrays, proteomics, brain images, among others. We study this problem in both frequentist and bayesian viewpoints.In a frequentist framework, we propose methods to deal with the problem of variable sélection, when the number of variables is much larger than the sample size with a possibly présence of additional structure in the predictor variables, such as high corrélations or order between successive variables. The performance of the proposed methods is theoretically investigated ; we prove that, under regularity conditions, the proposed estimators possess statistical good properties, such as Sparsity Oracle Inequalities, variable sélection consistency and asymptotic normality.In a Bayesian Framework, we propose a global noninformative approach for Bayesian variable sélection. In this thesis, we pay spécial attention to two calibration-free hierarchical Zellner’s g-priors. The first one is the Jeffreys prior which is not location invariant. A second one avoids this problem by only considering models with at least one variable in the model. The practical performance of the proposed methods is illustrated through numerical experiments on simulated and real world datasets, with a comparison betwenn Bayesian and frequentist approaches under a low informative constraint when the number of variables is almost equal to the number of observations. Réduction de la dimension Grandes dimensions Lasso Scad Elastic-net Sélection de modèles Propriétés d’Oracle Zellner’s g- prior Calibration Dimensionality réduction High dimensionality LASSO Scad Elastic-net Model selection Oracle property Zellner’s g-prior Calibration Scad
8	[en] VARIABLE SELECTION FOR LINEAR AND SMOOTH TRANSITION MODELS VIA LASSO: COMPARISONS, APPLICATIONS AND NEW METHODOLOGY / [pt] SELEÇÃO DE VARIÁVEIS PARA MODELOS LINEARES E DE TRANSIÇÃO SUAVE VIA LASSO: COMPARAÇÕES, APLICAÇÕES E NOVA METODOLOGIA CAMILA ROSA EPPRECHT 10 June 2016 (has links) [pt] A seleção de variáveis em modelos estatísticos é um problema importante, para o qual diferentes soluções foram propostas. Tradicionalmente, pode-se escolher o conjunto de variáveis explicativas usando critérios de informação ou informação à priori, mas o número total de modelos a serem estimados cresce exponencialmente a medida que o número de variáveis candidatas aumenta. Um problema adicional é a presença de mais variáveis candidatas que observações. Nesta tese nós estudamos diversos aspectos do problema de seleção de variáveis. No Capítulo 2, comparamos duas metodologias para regressão linear: Autometrics, que é uma abordagem geral para específico (GETS) baseada em testes estatísticos, e LASSO, um método de regularização. Diferentes cenários foram contemplados para a comparação no experimento de simulação, variando o tamanho da amostra, o número de variáveis relevantes e o número de variáveis candidatas. Em uma aplicação a dados reais, os métodos foram comparados para a previsão do PIB dos EUA. No Capítulo 3, introduzimos uma metodologia para seleção de variáveis em modelos regressivos e autoregressivos de transição suave (STR e STAR) baseada na regularização do LASSO. Apresentamos uma abordagem direta e uma escalonada (stepwise). Ambos os métodos foram testados com exercícios de simulação exaustivos e uma aplicação a dados genéticos. Finalmente, no Capítulo 4, propomos um critério de mínimos quadrados penalizado baseado na penalidade l1 do LASSO e no CVaR (Conditional Value at Risk) dos erros da regressão out-of-sample. Este é um problema de otimização quadrática resolvido pelo método de pontos interiores. Em um estudo de simulação usando modelos de regressão linear, mostra-se que o método proposto apresenta performance superior a do LASSO quando os dados são contaminados por outliers, mostrando ser um método robusto de estimação e seleção de variáveis. / [en] Variable selection in statistical models is an important problem, for which many different solutions have been proposed. Traditionally, one can choose the set of explanatory variables using information criteria or prior information, but the total number of models to evaluate increases exponentially as the number of candidate variables increases. One additional problem is the presence of more candidate variables than observations. In this thesis we study several aspects of the variable selection problem. First, we compare two procedures for linear regression: Autometrics, which is a general-to-specific (GETS) approach based on statistical tests, and LASSO, a shrinkage method. Different scenarios were contemplated for the comparison in a simulation experiment, varying the sample size, the number of relevant variables and the number of candidate variables. In a real data application, we compare the methods for GDP forecasting. In a second part, we introduce a variable selection methodology for smooth transition regressive (STR) and autoregressive (STAR) models based on LASSO regularization. We present a direct and a stepwise approach. Both methods are tested with extensive simulation exercises and an application to genetic data. Finally, we introduce a penalized least square criterion based on the LASSO l1- penalty and the CVaR (Conditional Value at Risk) of the out-of-sample regression errors. This is a quadratic optimization problem solved by interior point methods. In a simulation study in a linear regression framework, we show that the proposed method outperforms the LASSO when the data is contaminated by outliers, showing to be a robust method of estimation and variable selection. [pt] SELECAO DE VARIAVEIS [en] SELECTION OF VARIABLES [pt] CVAR [pt] LASSO [en] LASSO [pt] INTERACOES [en] INTERACTIONS [pt] SELECAO DE MODELOS [en] MODEL SELECTION [pt] AUTOMETRICS [en] AUTOMETRICS [pt] ADALASSO [en] ADALASSO [pt] PROPRIEDADE DE ORACULO [en] ORACLE PROPERTY [pt] MODELOS DE TRANSICAO SUAVE [en] SMOOTH TRANSITION MODELS [pt] DADOS GENETICOS [en] GENETIC DATA

Search results