Global ETD Search

21	Verossimilhança hierárquica em modelos de fragilidade / Hierarchical likelihood in frailty models Amorim, William Nilson de 12 February 2015 (has links) Os métodos de estimação para modelos de fragilidade vêm sendo bastante discutidos na literatura estatística devido a sua grande utilização em estudos de Análise de Sobrevivência. Vários métodos de estimação de parâmetros dos modelos foram desenvolvidos: procedimentos de estimação baseados no algoritmo EM, cadeias de Markov de Monte Carlo, processos de estimação usando verossimilhança parcial, verossimilhança penalizada, quasi-verossimilhança, entro outros. Uma alternativa que vem sendo utilizada atualmente é a utilização da verossimilhança hierárquica. O objetivo principal deste trabalho foi estudar as vantagens e desvantagens da verossimilhança hierárquica para a inferência em modelos de fragilidade em relação a verossimilhança penalizada, método atualmente mais utilizado. Nós aplicamos as duas metodologias a um banco de dados real, utilizando os pacotes estatísticos disponíveis no software R, e fizemos um estudo de simulação, visando comparar o viés e o erro quadrático médio das estimativas de cada abordagem. Pelos resultados encontrados, as duas metodologias apresentaram estimativas muito próximas, principalmente para os termos fixos. Do ponto de vista prático, a maior diferença encontrada foi o tempo de execução do algoritmo de estimação, muito maior na abordagem hierárquica. / Estimation procedures for frailty models have been widely discussed in the statistical literature due its widespread use in survival studies. Several estimation methods were developed: procedures based on the EM algorithm, Monte Carlo Markov chains, estimation processes based on parcial likelihood, penalized likelihood and quasi-likelihood etc. An alternative currently used is the hierarchical likelihood. The main objective of this work was to study the hierarchical likelihood advantages and disadvantages for inference in frailty models when compared with the penalized likelihood method, which is the most used one. We applied both approaches to a real data set, using R packages available. Besides, we performed a simulation study in order to compare the methods through out the bias and the mean square error of the estimators. Both methodologies presented very similar estimates, mainly for the fixed effects. In practice, the great difference was the computational cost, much higher in the hierarchical approach. Análise de sobrevivência Frailty models Hierarchical likelihood Modelos de fragilidade Penalized likelihood Survival analysis Verossimilhança hierárquica Verossimilhança penalizada
22	Seleção de modelos para segmentação de sequências simbólicas usando máxima verossimilhança penalizada / A model selection criterion for the segmentation of symbolic sequences using penalized maximum likelihood Castro, Bruno Monte de 20 February 2013 (has links) O problema de segmentação de sequências tem o objetivo de particionar uma sequência ou um conjunto delas em um número finito de segmentos distintos tão homogêneos quanto possível. Neste trabalho consideramos o problema de segmentação de um conjunto de sequências aleatórias, com valores em um alfabeto $\\mathcal$ finito, em um número finito de blocos independentes. Supomos ainda que temos $m$ sequências independentes de tamanho $n$, construídas pela concatenação de $s$ segmentos de comprimento $l^{}_j$, sendo que cada bloco é obtido a partir da distribuição $\\p _j$ em $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Além disso denotamos os verdadeiros pontos de corte pelo vetor ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, com $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, esses pontos representam a mudança de segmento. Propomos usar o critério da máxima verossimilhança penalizada para inferir simultaneamente o número de pontos de corte e a posição de cada um desses pontos. Também apresentamos um algoritmo para segmentação de sequências e realizamos algumas simulações para mostrar seu funcionamento e sua velocidade de convergência. Nosso principal resultado é a demonstração da consistência forte do estimador dos pontos de corte quando o $m$ tende ao infinito. / The sequence segmentation problem aims to partition a sequence or a set of sequences into a finite number of segments as homogeneous as possible. In this work we consider the problem of segmenting a set of random sequences with values in a finite alphabet $\\mathcal$ into a finite number of independent blocks. We suppose also that we have $m$ independent sequences of length $n$, constructed by the concatenation of $s$ segments of length $l^{}_j$ and each block is obtained from the distribution $\\p _j$ over $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Besides we denote the real cut points by the vector ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, with $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, these points represent the change of segment. We propose to use a penalized maximum likelihood criterion to infer simultaneously the number of cut points and the position of each one those points. We also present a algorithm to sequence segmentation and we present some simulations to show how it works and its convergence speed. Our principal result is the proof of strong consistency of this estimators when $m$ grows to infinity. consistência forte máxima verossimilhança penalizada penalized maximum likelihood Segmentação de sequências Sequence segmentation strong consistency
23	Verossimilhança hierárquica em modelos de fragilidade / Hierarchical likelihood in frailty models William Nilson de Amorim 12 February 2015 (has links) Os métodos de estimação para modelos de fragilidade vêm sendo bastante discutidos na literatura estatística devido a sua grande utilização em estudos de Análise de Sobrevivência. Vários métodos de estimação de parâmetros dos modelos foram desenvolvidos: procedimentos de estimação baseados no algoritmo EM, cadeias de Markov de Monte Carlo, processos de estimação usando verossimilhança parcial, verossimilhança penalizada, quasi-verossimilhança, entro outros. Uma alternativa que vem sendo utilizada atualmente é a utilização da verossimilhança hierárquica. O objetivo principal deste trabalho foi estudar as vantagens e desvantagens da verossimilhança hierárquica para a inferência em modelos de fragilidade em relação a verossimilhança penalizada, método atualmente mais utilizado. Nós aplicamos as duas metodologias a um banco de dados real, utilizando os pacotes estatísticos disponíveis no software R, e fizemos um estudo de simulação, visando comparar o viés e o erro quadrático médio das estimativas de cada abordagem. Pelos resultados encontrados, as duas metodologias apresentaram estimativas muito próximas, principalmente para os termos fixos. Do ponto de vista prático, a maior diferença encontrada foi o tempo de execução do algoritmo de estimação, muito maior na abordagem hierárquica. / Estimation procedures for frailty models have been widely discussed in the statistical literature due its widespread use in survival studies. Several estimation methods were developed: procedures based on the EM algorithm, Monte Carlo Markov chains, estimation processes based on parcial likelihood, penalized likelihood and quasi-likelihood etc. An alternative currently used is the hierarchical likelihood. The main objective of this work was to study the hierarchical likelihood advantages and disadvantages for inference in frailty models when compared with the penalized likelihood method, which is the most used one. We applied both approaches to a real data set, using R packages available. Besides, we performed a simulation study in order to compare the methods through out the bias and the mean square error of the estimators. Both methodologies presented very similar estimates, mainly for the fixed effects. In practice, the great difference was the computational cost, much higher in the hierarchical approach. Análise de sobrevivência Modelos de fragilidade Verossimilhança hierárquica Verossimilhança penalizada Frailty models Hierarchical likelihood Penalized likelihood Survival analysis
24	Seleção de modelos para segmentação de sequências simbólicas usando máxima verossimilhança penalizada / A model selection criterion for the segmentation of symbolic sequences using penalized maximum likelihood Bruno Monte de Castro 20 February 2013 (has links) O problema de segmentação de sequências tem o objetivo de particionar uma sequência ou um conjunto delas em um número finito de segmentos distintos tão homogêneos quanto possível. Neste trabalho consideramos o problema de segmentação de um conjunto de sequências aleatórias, com valores em um alfabeto $\\mathcal$ finito, em um número finito de blocos independentes. Supomos ainda que temos $m$ sequências independentes de tamanho $n$, construídas pela concatenação de $s$ segmentos de comprimento $l^{}_j$, sendo que cada bloco é obtido a partir da distribuição $\\p _j$ em $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Além disso denotamos os verdadeiros pontos de corte pelo vetor ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, com $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, esses pontos representam a mudança de segmento. Propomos usar o critério da máxima verossimilhança penalizada para inferir simultaneamente o número de pontos de corte e a posição de cada um desses pontos. Também apresentamos um algoritmo para segmentação de sequências e realizamos algumas simulações para mostrar seu funcionamento e sua velocidade de convergência. Nosso principal resultado é a demonstração da consistência forte do estimador dos pontos de corte quando o $m$ tende ao infinito. / The sequence segmentation problem aims to partition a sequence or a set of sequences into a finite number of segments as homogeneous as possible. In this work we consider the problem of segmenting a set of random sequences with values in a finite alphabet $\\mathcal$ into a finite number of independent blocks. We suppose also that we have $m$ independent sequences of length $n$, constructed by the concatenation of $s$ segments of length $l^{}_j$ and each block is obtained from the distribution $\\p _j$ over $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Besides we denote the real cut points by the vector ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, with $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, these points represent the change of segment. We propose to use a penalized maximum likelihood criterion to infer simultaneously the number of cut points and the position of each one those points. We also present a algorithm to sequence segmentation and we present some simulations to show how it works and its convergence speed. Our principal result is the proof of strong consistency of this estimators when $m$ grows to infinity. consistência forte máxima verossimilhança penalizada Segmentação de sequências penalized maximum likelihood Sequence segmentation strong consistency
25	Seleção de covariáveis para modelos de sobrevivência via verossimilhança penalizada / Variable selection for survival models based on penalized likelihood Jony Arrais Pinto Junior 18 February 2009 (has links) A seleção de variáveis é uma importante fase para a construção de um modelo parcimonioso. Entretanto, as técnicas mais populares de seleção de variáveis, como, por exemplo, a seleção do melhor subconjunto de variáveis e o método stepwise, ignoram erros estocásticos inerentes à fase de seleção das variáveis. Neste trabalho, foram estudados procedimentos alternativos aos métodos mais populares para o modelo de riscos proporcionais de Cox e o modelo de Cox com fragilidade gama. Os métodos alternativos são baseados em verossimilhançaa penalizada e diferem dos métodos usuais de seleção de variáveis, pois têm como objetivo excluir do modelo variáveis não significantes estimando seus coeficientes como zero. O estimador resultante possui propriedades desejáveis com escolhas apropriadas de funções de penalidade e do parâmetro de suavização. A avaliação desses métodos foi realizada por meio de simulação e uma aplicação a um conjunto de dados reais foi considerada. / Variable selection is an important step when setting a parsimonious model. However, the most popular variable selection techniques, such as the best subset variable selection and the stepwise method, do not take into account inherent stochastic errors in the variable selection step. This work presents a study of alternative procedures to more popular methods for the Cox proportional hazards model and the frailty model. The alternative methods are based on penalized likelihood and differ from the usual variable selection methods, since their objective is to exclude from the model non significant variables, estimating their coefficient as zero. The resulting estimator has nice properties with appropriate choices of penalty functions and the tuning parameter. The assessment of these methods was studied through simulations, and an application to a real data set was considered. funções de penalidade Seleção de variáveis verossimilhança penalizada penalized likelihood penalty functions variable selection
26	Partly parametric generalized additive model Zhang, Tianyang 01 December 2010 (has links) In many scientific studies, the response variable bears a generalized nonlinear regression relationship with a certain covariate of interest, which may, however, be confounded by other covariates with unknown functional form. We propose a new class of models, the partly parametric generalized additive model (PPGAM) for doing generalized nonlinear regression with the confounding covariate effects adjusted nonparametrically. To avoid the curse of dimensionality, the PPGAM specifies that, conditional on the covariates, the response distribution belongs to the exponential family with the mean linked to an additive predictor comprising a nonlinear parametric function that is of main interest, plus additive, smooth functions of other covariates. The PPGAM extends both the generalized additive model (GAM) and the generalized nonlinear regression model. We propose to estimate a PPGAM by the method of penalized likelihood. We derive some asymptotic properties of the penalized likelihood estimator, including consistency and asymptotic normality of the parametric estimator of the nonlinear regression component. We propose a model selection criterion for the PPGAM, which resembles the BIC. We illustrate the new methodologies by simulations and real applications. We have developed an R package PPGAM that implements the methodologies expounded herein. Ecological Exponential families GAM Nonlinear Regression Penalized Likelihood Semiparametric Statistics and Probability
27	Regularized methods for high-dimensional and bi-level variable selection Breheny, Patrick John 01 July 2009 (has links) Many traditional approaches cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP. The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems. In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. The Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals. I use these hypotheses to connect the previously disparate elds of multiple comparisons and penalized regression, develop estimators for the false discovery rates of methods such as the lasso and elastic net, and establish theoretical results. Finally, the methods from all three sections are studied in a number of simulations and applied to real data from gene expression and genetic association studies. coordinate descent false discovery rate group lasso lasso penalized regression Biostatistics
28	Grouped variable selection in high dimensional partially linear additive Cox model Liu, Li 01 December 2010 (has links) In the analysis of survival outcome supplemented with both clinical information and high-dimensional gene expression data, traditional Cox proportional hazard model fails to meet some emerging needs in biological research. First, the number of covariates is generally much larger the sample size. Secondly, predicting an outcome with individual gene expressions is inadequate because a gene's expression is regulated by multiple biological processes and functional units. There is a need to understand the impact of changes at a higher level such as molecular function, cellular component, biological process, or pathway. The change at a higher level is usually measured with a set of gene expressions related to the biological process. That is, we need to model the outcome with gene sets as variable groups and the gene sets could be partially overlapped also. In this thesis work, we investigate the impact of a penalized Cox regression procedure on regularization, parameter estimation, variable group selection, and nonparametric modeling of nonlinear eects with a time-to-event outcome. We formulate the problem as a partially linear additive Cox model with high-dimensional data. We group genes into gene sets and approximate the nonparametric components by truncated series expansions with B-spline bases. After grouping and approximation, the problem of variable selection becomes that of selecting groups of coecients in a gene set or in an approximation. We apply the group Lasso to obtain an initial solution path and reduce the dimension of the problem and then update the whole solution path with the adaptive group Lasso. We also propose a generalized group lasso method to provide more freedom in specifying the penalty and excluding covariates from being penalized. A modied Newton-Raphson method is designed for stable and rapid computation. The core programs are written in the C language. An user-friendly R interface is implemented to perform all the calculations by calling the core programs. We demonstrate the asymptotic properties of the proposed methods. Simulation studies are carried out to evaluate the finite sample performance of the proposed procedure using several tuning parameter selection methods for choosing the point on the solution path as the nal estimator. We also apply the proposed approach on two real data examples. adaptive group lasso Cox Model grouped variable selection group lasso high dimensional penalized regression Biostatistics
29	Statistical detection with weak signals via regularization Li, Jinzheng 01 July 2012 (has links) There has been an increasing interest in uncovering smuggled nuclear materials associated with the War on Terror. Detection of special nuclear materials hidden in cargo containers is a major challenge in national and international security. We propose a new physics-based method to determine the presence of the spectral signature of one or more nuclides from a poorly resolved spectra with weak signatures. The method is different from traditional methods that rely primarily on peak finding algorithms. The new approach considers each of the signatures in the library to be a linear combination of subspectra. These subspectra are obtained by assuming a signature consisting of just one of the unique gamma rays emitted by the nuclei. We propose a Poisson regression model for deducing which nuclei are present in the observed spectrum. In recognition that a radiation source generally comprises few nuclear materials, the underlying Poisson model is sparse, i.e. most of the regression coefficients are zero (positive coefficients correspond to the presence of nuclear materials). We develop an iterative algorithm for a penalized likelihood estimation that prompts sparsity. We illustrate the efficacy of the proposed method by simulations using a variety of poorly resolved, low signal-to-noise ratio (SNR) situations, which show that the proposed approach enjoys excellent empirical performance even with SNR as low as to -15db. The proposed method is shown to be variable-selection consistent, in the framework of increasing detection time and under mild regularity conditions. We study the problem of testing for shielding, i.e. the presence of intervening materials that attenuate the gamma ray signal. We show that, as detection time increases to infinity, the Lagrange multiplier test, the likelihood ratio test and Wald test are asymptotically equivalent, under the null hypothesis, and their asymptotic null distribution is Chi-square. We also derived the local power of these tests. We also develop a nonparametric approach for detecting spectra indicative of the presence of SNM. This approach characterizes the shape change in a spectrum from background radiation. We do this by proposing a dissimilarity function that characterizes the complete shape change of a spectrum from the background, over all energy channels. We derive the null asymptotic test distributions in terms of functionals of the Brownian bridge. Simulation results show that the proposed approach is very powerful and promising for detecting weak signals. It is able to accurately detect weak signals with SNR as low as -37db. gamma-ray spectrum Hypothesis Testing penalized likelihood estimation Poisson regression sparisty weak signal detection Statistics and Probability
30	Marginal false discovery rate approaches to inference on penalized regression models Miller, Ryan 01 August 2018 (has links) Data containing large number of variables is becoming increasingly more common and sparsity inducing penalized regression methods, such the lasso, have become a popular analysis tool for these datasets due to their ability to naturally perform variable selection. However, quantifying the importance of the variables selected by these models is a difficult task. These difficulties are compounded by the tendency for the most predictive models, for example those which were chosen using procedures like cross-validation, to include substantial amounts of noise variables with no real relationship with the outcome. To address the task of performing inference on penalized regression models, this thesis proposes false discovery rate approaches for a broad class of penalized regression models. This work includes the development of an upper bound for the number of noise variables in a model, as well as local false discovery rate approaches that quantify the likelihood of each individual selection being a false discovery. These methods are applicable to a wide range of penalties, such as the lasso, elastic net, SCAD, and MCP; a wide range of models, including linear regression, generalized linear models, and Cox proportional hazards models; and are also extended to the group regression setting under the group lasso penalty. In addition to studying these methods using numerous simulation studies, the practical utility of these methods is demonstrated using real data from several high-dimensional genome wide association studies. elastic net false discovery rate high dimensional data inference lasso penalized regression Biostatistics

Search results