Global ETD Search

1	Robust mixture modeling Yu, Chun January 1900 (has links) Doctor of Philosophy / Department of Statistics / Weixin Yao and Kun Chen / Ordinary least-squares (OLS) estimators for a linear model are very sensitive to unusual values in the design space or outliers among y values. Even one single atypical value may have a large effect on the parameter estimates. In this proposal, we first review and describe some available and popular robust techniques, including some recent developed ones, and compare them in terms of breakdown point and efficiency. In addition, we also use a simulation study and a real data application to compare the performance of existing robust methods under different scenarios. Finite mixture models are widely applied in a variety of random phenomena. However, inference of mixture models is a challenging work when the outliers exist in the data. The traditional maximum likelihood estimator (MLE) is sensitive to outliers. In this proposal, we propose a Robust Mixture via Mean shift penalization (RMM) in mixture models and Robust Mixture Regression via Mean shift penalization (RMRM) in mixture regression, to achieve simultaneous outlier detection and parameter estimation. A mean shift parameter is added to the mixture models, and penalized by a nonconvex penalty function. With this model setting, we develop an iterative thresholding embedded EM algorithm to maximize the penalized objective function. Comparing with other existing robust methods, the proposed methods show outstanding performance in both identifying outliers and estimating the parameters. Robust Outlier detection Mixture models EM algorithm Penalized likelihood Statistics (0463)
2	Seleção de covariáveis para modelos de sobrevivência via verossimilhança penalizada / Variable selection for survival models based on penalized likelihood Pinto Junior, Jony Arrais 18 February 2009 (has links) A seleção de variáveis é uma importante fase para a construção de um modelo parcimonioso. Entretanto, as técnicas mais populares de seleção de variáveis, como, por exemplo, a seleção do melhor subconjunto de variáveis e o método stepwise, ignoram erros estocásticos inerentes à fase de seleção das variáveis. Neste trabalho, foram estudados procedimentos alternativos aos métodos mais populares para o modelo de riscos proporcionais de Cox e o modelo de Cox com fragilidade gama. Os métodos alternativos são baseados em verossimilhançaa penalizada e diferem dos métodos usuais de seleção de variáveis, pois têm como objetivo excluir do modelo variáveis não significantes estimando seus coeficientes como zero. O estimador resultante possui propriedades desejáveis com escolhas apropriadas de funções de penalidade e do parâmetro de suavização. A avaliação desses métodos foi realizada por meio de simulação e uma aplicação a um conjunto de dados reais foi considerada. / Variable selection is an important step when setting a parsimonious model. However, the most popular variable selection techniques, such as the best subset variable selection and the stepwise method, do not take into account inherent stochastic errors in the variable selection step. This work presents a study of alternative procedures to more popular methods for the Cox proportional hazards model and the frailty model. The alternative methods are based on penalized likelihood and differ from the usual variable selection methods, since their objective is to exclude from the model non significant variables, estimating their coefficient as zero. The resulting estimator has nice properties with appropriate choices of penalty functions and the tuning parameter. The assessment of these methods was studied through simulations, and an application to a real data set was considered. funções de penalidade penalized likelihood penalty functions Seleção de variáveis variable selection verossimilhança penalizada
3	Verossimilhança hierárquica em modelos de fragilidade / Hierarchical likelihood in frailty models Amorim, William Nilson de 12 February 2015 (has links) Os métodos de estimação para modelos de fragilidade vêm sendo bastante discutidos na literatura estatística devido a sua grande utilização em estudos de Análise de Sobrevivência. Vários métodos de estimação de parâmetros dos modelos foram desenvolvidos: procedimentos de estimação baseados no algoritmo EM, cadeias de Markov de Monte Carlo, processos de estimação usando verossimilhança parcial, verossimilhança penalizada, quasi-verossimilhança, entro outros. Uma alternativa que vem sendo utilizada atualmente é a utilização da verossimilhança hierárquica. O objetivo principal deste trabalho foi estudar as vantagens e desvantagens da verossimilhança hierárquica para a inferência em modelos de fragilidade em relação a verossimilhança penalizada, método atualmente mais utilizado. Nós aplicamos as duas metodologias a um banco de dados real, utilizando os pacotes estatísticos disponíveis no software R, e fizemos um estudo de simulação, visando comparar o viés e o erro quadrático médio das estimativas de cada abordagem. Pelos resultados encontrados, as duas metodologias apresentaram estimativas muito próximas, principalmente para os termos fixos. Do ponto de vista prático, a maior diferença encontrada foi o tempo de execução do algoritmo de estimação, muito maior na abordagem hierárquica. / Estimation procedures for frailty models have been widely discussed in the statistical literature due its widespread use in survival studies. Several estimation methods were developed: procedures based on the EM algorithm, Monte Carlo Markov chains, estimation processes based on parcial likelihood, penalized likelihood and quasi-likelihood etc. An alternative currently used is the hierarchical likelihood. The main objective of this work was to study the hierarchical likelihood advantages and disadvantages for inference in frailty models when compared with the penalized likelihood method, which is the most used one. We applied both approaches to a real data set, using R packages available. Besides, we performed a simulation study in order to compare the methods through out the bias and the mean square error of the estimators. Both methodologies presented very similar estimates, mainly for the fixed effects. In practice, the great difference was the computational cost, much higher in the hierarchical approach. Análise de sobrevivência Frailty models Hierarchical likelihood Modelos de fragilidade Penalized likelihood Survival analysis Verossimilhança hierárquica Verossimilhança penalizada
4	Verossimilhança hierárquica em modelos de fragilidade / Hierarchical likelihood in frailty models William Nilson de Amorim 12 February 2015 (has links) Os métodos de estimação para modelos de fragilidade vêm sendo bastante discutidos na literatura estatística devido a sua grande utilização em estudos de Análise de Sobrevivência. Vários métodos de estimação de parâmetros dos modelos foram desenvolvidos: procedimentos de estimação baseados no algoritmo EM, cadeias de Markov de Monte Carlo, processos de estimação usando verossimilhança parcial, verossimilhança penalizada, quasi-verossimilhança, entro outros. Uma alternativa que vem sendo utilizada atualmente é a utilização da verossimilhança hierárquica. O objetivo principal deste trabalho foi estudar as vantagens e desvantagens da verossimilhança hierárquica para a inferência em modelos de fragilidade em relação a verossimilhança penalizada, método atualmente mais utilizado. Nós aplicamos as duas metodologias a um banco de dados real, utilizando os pacotes estatísticos disponíveis no software R, e fizemos um estudo de simulação, visando comparar o viés e o erro quadrático médio das estimativas de cada abordagem. Pelos resultados encontrados, as duas metodologias apresentaram estimativas muito próximas, principalmente para os termos fixos. Do ponto de vista prático, a maior diferença encontrada foi o tempo de execução do algoritmo de estimação, muito maior na abordagem hierárquica. / Estimation procedures for frailty models have been widely discussed in the statistical literature due its widespread use in survival studies. Several estimation methods were developed: procedures based on the EM algorithm, Monte Carlo Markov chains, estimation processes based on parcial likelihood, penalized likelihood and quasi-likelihood etc. An alternative currently used is the hierarchical likelihood. The main objective of this work was to study the hierarchical likelihood advantages and disadvantages for inference in frailty models when compared with the penalized likelihood method, which is the most used one. We applied both approaches to a real data set, using R packages available. Besides, we performed a simulation study in order to compare the methods through out the bias and the mean square error of the estimators. Both methodologies presented very similar estimates, mainly for the fixed effects. In practice, the great difference was the computational cost, much higher in the hierarchical approach. Análise de sobrevivência Modelos de fragilidade Verossimilhança hierárquica Verossimilhança penalizada Frailty models Hierarchical likelihood Penalized likelihood Survival analysis
5	Seleção de covariáveis para modelos de sobrevivência via verossimilhança penalizada / Variable selection for survival models based on penalized likelihood Jony Arrais Pinto Junior 18 February 2009 (has links) A seleção de variáveis é uma importante fase para a construção de um modelo parcimonioso. Entretanto, as técnicas mais populares de seleção de variáveis, como, por exemplo, a seleção do melhor subconjunto de variáveis e o método stepwise, ignoram erros estocásticos inerentes à fase de seleção das variáveis. Neste trabalho, foram estudados procedimentos alternativos aos métodos mais populares para o modelo de riscos proporcionais de Cox e o modelo de Cox com fragilidade gama. Os métodos alternativos são baseados em verossimilhançaa penalizada e diferem dos métodos usuais de seleção de variáveis, pois têm como objetivo excluir do modelo variáveis não significantes estimando seus coeficientes como zero. O estimador resultante possui propriedades desejáveis com escolhas apropriadas de funções de penalidade e do parâmetro de suavização. A avaliação desses métodos foi realizada por meio de simulação e uma aplicação a um conjunto de dados reais foi considerada. / Variable selection is an important step when setting a parsimonious model. However, the most popular variable selection techniques, such as the best subset variable selection and the stepwise method, do not take into account inherent stochastic errors in the variable selection step. This work presents a study of alternative procedures to more popular methods for the Cox proportional hazards model and the frailty model. The alternative methods are based on penalized likelihood and differ from the usual variable selection methods, since their objective is to exclude from the model non significant variables, estimating their coefficient as zero. The resulting estimator has nice properties with appropriate choices of penalty functions and the tuning parameter. The assessment of these methods was studied through simulations, and an application to a real data set was considered. funções de penalidade Seleção de variáveis verossimilhança penalizada penalized likelihood penalty functions variable selection
6	Partly parametric generalized additive model Zhang, Tianyang 01 December 2010 (has links) In many scientific studies, the response variable bears a generalized nonlinear regression relationship with a certain covariate of interest, which may, however, be confounded by other covariates with unknown functional form. We propose a new class of models, the partly parametric generalized additive model (PPGAM) for doing generalized nonlinear regression with the confounding covariate effects adjusted nonparametrically. To avoid the curse of dimensionality, the PPGAM specifies that, conditional on the covariates, the response distribution belongs to the exponential family with the mean linked to an additive predictor comprising a nonlinear parametric function that is of main interest, plus additive, smooth functions of other covariates. The PPGAM extends both the generalized additive model (GAM) and the generalized nonlinear regression model. We propose to estimate a PPGAM by the method of penalized likelihood. We derive some asymptotic properties of the penalized likelihood estimator, including consistency and asymptotic normality of the parametric estimator of the nonlinear regression component. We propose a model selection criterion for the PPGAM, which resembles the BIC. We illustrate the new methodologies by simulations and real applications. We have developed an R package PPGAM that implements the methodologies expounded herein. Ecological Exponential families GAM Nonlinear Regression Penalized Likelihood Semiparametric Statistics and Probability
7	Statistical detection with weak signals via regularization Li, Jinzheng 01 July 2012 (has links) There has been an increasing interest in uncovering smuggled nuclear materials associated with the War on Terror. Detection of special nuclear materials hidden in cargo containers is a major challenge in national and international security. We propose a new physics-based method to determine the presence of the spectral signature of one or more nuclides from a poorly resolved spectra with weak signatures. The method is different from traditional methods that rely primarily on peak finding algorithms. The new approach considers each of the signatures in the library to be a linear combination of subspectra. These subspectra are obtained by assuming a signature consisting of just one of the unique gamma rays emitted by the nuclei. We propose a Poisson regression model for deducing which nuclei are present in the observed spectrum. In recognition that a radiation source generally comprises few nuclear materials, the underlying Poisson model is sparse, i.e. most of the regression coefficients are zero (positive coefficients correspond to the presence of nuclear materials). We develop an iterative algorithm for a penalized likelihood estimation that prompts sparsity. We illustrate the efficacy of the proposed method by simulations using a variety of poorly resolved, low signal-to-noise ratio (SNR) situations, which show that the proposed approach enjoys excellent empirical performance even with SNR as low as to -15db. The proposed method is shown to be variable-selection consistent, in the framework of increasing detection time and under mild regularity conditions. We study the problem of testing for shielding, i.e. the presence of intervening materials that attenuate the gamma ray signal. We show that, as detection time increases to infinity, the Lagrange multiplier test, the likelihood ratio test and Wald test are asymptotically equivalent, under the null hypothesis, and their asymptotic null distribution is Chi-square. We also derived the local power of these tests. We also develop a nonparametric approach for detecting spectra indicative of the presence of SNM. This approach characterizes the shape change in a spectrum from background radiation. We do this by proposing a dissimilarity function that characterizes the complete shape change of a spectrum from the background, over all energy channels. We derive the null asymptotic test distributions in terms of functionals of the Brownian bridge. Simulation results show that the proposed approach is very powerful and promising for detecting weak signals. It is able to accurately detect weak signals with SNR as low as -37db. gamma-ray spectrum Hypothesis Testing penalized likelihood estimation Poisson regression sparisty weak signal detection Statistics and Probability
8	Semiparametric regression analysis of zero-inflated data Liu, Hai 01 July 2009 (has links) Zero-inflated data abound in ecological studies as well as in other scientific and quantitative fields. Nonparametric regression with zero-inflated response may be studied via the zero-inflated generalized additive model (ZIGAM). ZIGAM assumes that the conditional distribution of the response variable belongs to the zero-inflated 1-parameter exponential family which is a probabilistic mixture of the zero atom and the 1-parameter exponential family, where the zero atom accounts for an excess of zeroes in the data. We propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzing zero-inflated data, with the further assumption that the probability of non-zero-inflation is some monotone function of the (non-zero-inflated) exponential family distribution mean. When the latter assumption obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We develop an iterative algorithm for model estimation based on the penalized likelihood approach, and derive formulas for constructing confidence intervals of the maximum penalized likelihood estimator. Some asymptotic properties including the consistency of the regression function estimator and the limiting distribution of the parametric estimator are derived. We also propose a Bayesian model selection criterion for choosing between the unconstrained and the constrained ZIGAMs. We consider several useful extensions of the COZIGAM, including imposing additive-component-specific proportional and partial constraints, and incorporating threshold effects to account for regime shift phenomena. The new methods are illustrated with both simulated data and real applications. An R package COZIGAM has been developed for model fitting and model selection with zero-inflated data. Asymptotic normality Constrained model EM algorithm Model selection Penalized likelihood Threshold model Statistics and Probability
9	Mixture distributions with application to microarray data analysis Lynch, O'Neil 01 June 2009 (has links) The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. In this dissertation we proposed two methods to determine differentially expressed genes. For the penalized normal mixture model (PMMM) to determine genes that are differentially expressed, we penalized both the variance and the mixing proportion parameters simultaneously. The variance parameter was penalized so that the log-likelihood will be bounded, while the mixing proportion parameter was penalized so that its estimates are not on the boundary of its parametric space. The null distribution of the likelihood ratio test statistic (LRTS) was simulated so that we could perform a hypothesis test for the number of components of the penalized normal mixture model. In addition to simulating the null distribution of the LRTS for the penalized normal mixture model, we showed that the maximum likelihood estimates were asymptotically normal, which is a first step that is necessary to prove the asymptotic null distribution of the LRTS. This result is a significant contribution to field of normal mixture model. The modified p-value approach for detecting differentially expressed genes was also discussed in this dissertation. The modified p-value approach was implemented so that a hypothesis test for the number of components can be conducted by using the modified likelihood ratio test. In the modified p-value approach we penalized the mixing proportion so that the estimates of the mixing proportion are not on the boundary of its parametric space. The null distribution of the (LRTS) was simulated so that the number of components of the uniform beta mixture model can be determined. Finally, for both modified methods, the penalized normal mixture model and the modified p-value approach were applied to simulated and real data. Likelihood ratio test Modified likelihood Penalized likelihood Asymptotic chi-square distribution Consistency American Studies Arts and Humanities
10	Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function VanDerwerken, Douglas Nielsen 10 March 2011 (has links) (PDF) L0 penalized likelihood procedures like Mallows' Cp, AIC, and BIC directly penalize for the number of variables included in a regression model. This is a straightforward approach to the problem of overfitting, and these methods are now part of every statistician's repertoire. However, these procedures have been shown to sometimes result in unstable parameter estimates as a result on the L0 penalty's discontinuity at zero. One proposed alternative, seamless-L0 (SELO), utilizes a continuous penalty function that mimics L0 and allows for stable estimates. Like other similar methods (e.g. LASSO and SCAD), SELO produces sparse solutions because the penalty function is non-differentiable at the origin. Because these penalized likelihoods are singular (non-differentiable) at zero, there is no closed-form solution for the extremum of the objective function. We propose a continuous and everywhere-differentiable penalty function that can have arbitrarily steep slope in a neighborhood near zero, thus mimicking the L0 penalty, but allowing for a nearly closed-form solution for the beta-hat vector. Because our function is not singular at zero, beta-hat will have no zero-valued components, although some will have been shrunk arbitrarily close thereto. We employ a BIC-selected tuning parameter used in the shrinkage step to perform zero-thresholding as well. We call the resulting vector of coefficients the ShrinkSet estimator. It is comparable to SELO in terms of model performance (selecting the truly nonzero coefficients, overall MSE, etc.), but we believe it to be more intuitive and simpler to compute. We provide strong evidence that the estimator enjoys favorable asymptotic properties, including the oracle property. Penalized likelihood variable selection oracle property large p small n Statistics and Probability

Search results