Global ETD Search

1	Seleção de modelos para segmentação de sequências simbólicas usando máxima verossimilhança penalizada / A model selection criterion for the segmentation of symbolic sequences using penalized maximum likelihood Castro, Bruno Monte de 20 February 2013 (has links) O problema de segmentação de sequências tem o objetivo de particionar uma sequência ou um conjunto delas em um número finito de segmentos distintos tão homogêneos quanto possível. Neste trabalho consideramos o problema de segmentação de um conjunto de sequências aleatórias, com valores em um alfabeto $\\mathcal$ finito, em um número finito de blocos independentes. Supomos ainda que temos $m$ sequências independentes de tamanho $n$, construídas pela concatenação de $s$ segmentos de comprimento $l^{}_j$, sendo que cada bloco é obtido a partir da distribuição $\\p _j$ em $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Além disso denotamos os verdadeiros pontos de corte pelo vetor ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, com $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, esses pontos representam a mudança de segmento. Propomos usar o critério da máxima verossimilhança penalizada para inferir simultaneamente o número de pontos de corte e a posição de cada um desses pontos. Também apresentamos um algoritmo para segmentação de sequências e realizamos algumas simulações para mostrar seu funcionamento e sua velocidade de convergência. Nosso principal resultado é a demonstração da consistência forte do estimador dos pontos de corte quando o $m$ tende ao infinito. / The sequence segmentation problem aims to partition a sequence or a set of sequences into a finite number of segments as homogeneous as possible. In this work we consider the problem of segmenting a set of random sequences with values in a finite alphabet $\\mathcal$ into a finite number of independent blocks. We suppose also that we have $m$ independent sequences of length $n$, constructed by the concatenation of $s$ segments of length $l^{}_j$ and each block is obtained from the distribution $\\p _j$ over $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Besides we denote the real cut points by the vector ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, with $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, these points represent the change of segment. We propose to use a penalized maximum likelihood criterion to infer simultaneously the number of cut points and the position of each one those points. We also present a algorithm to sequence segmentation and we present some simulations to show how it works and its convergence speed. Our principal result is the proof of strong consistency of this estimators when $m$ grows to infinity. consistência forte máxima verossimilhança penalizada penalized maximum likelihood Segmentação de sequências Sequence segmentation strong consistency
2	Seleção de modelos para segmentação de sequências simbólicas usando máxima verossimilhança penalizada / A model selection criterion for the segmentation of symbolic sequences using penalized maximum likelihood Bruno Monte de Castro 20 February 2013 (has links) O problema de segmentação de sequências tem o objetivo de particionar uma sequência ou um conjunto delas em um número finito de segmentos distintos tão homogêneos quanto possível. Neste trabalho consideramos o problema de segmentação de um conjunto de sequências aleatórias, com valores em um alfabeto $\\mathcal$ finito, em um número finito de blocos independentes. Supomos ainda que temos $m$ sequências independentes de tamanho $n$, construídas pela concatenação de $s$ segmentos de comprimento $l^{}_j$, sendo que cada bloco é obtido a partir da distribuição $\\p _j$ em $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Além disso denotamos os verdadeiros pontos de corte pelo vetor ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, com $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, esses pontos representam a mudança de segmento. Propomos usar o critério da máxima verossimilhança penalizada para inferir simultaneamente o número de pontos de corte e a posição de cada um desses pontos. Também apresentamos um algoritmo para segmentação de sequências e realizamos algumas simulações para mostrar seu funcionamento e sua velocidade de convergência. Nosso principal resultado é a demonstração da consistência forte do estimador dos pontos de corte quando o $m$ tende ao infinito. / The sequence segmentation problem aims to partition a sequence or a set of sequences into a finite number of segments as homogeneous as possible. In this work we consider the problem of segmenting a set of random sequences with values in a finite alphabet $\\mathcal$ into a finite number of independent blocks. We suppose also that we have $m$ independent sequences of length $n$, constructed by the concatenation of $s$ segments of length $l^{}_j$ and each block is obtained from the distribution $\\p _j$ over $\\mathcal^{l^{}_j}, \\; j=1,\\cdots, s$. Besides we denote the real cut points by the vector ${{\\bf k}}^{}=(k^{}_1,\\cdots,k^{}_)$, with $k^{}_i=\\sum _{j=1}^l^{}_j$, $i=1,\\cdots, s-1$, these points represent the change of segment. We propose to use a penalized maximum likelihood criterion to infer simultaneously the number of cut points and the position of each one those points. We also present a algorithm to sequence segmentation and we present some simulations to show how it works and its convergence speed. Our principal result is the proof of strong consistency of this estimators when $m$ grows to infinity. consistência forte máxima verossimilhança penalizada Segmentação de sequências penalized maximum likelihood Sequence segmentation strong consistency
3	Essays on Estimation Methods for Factor Models and Structural Equation Models Jin, Shaobo January 2015 (has links) This thesis which consists of four papers is concerned with estimation methods in factor analysis and structural equation models. New estimation methods are proposed and investigated. In paper I an approximation of the penalized maximum likelihood (ML) is introduced to fit an exploratory factor analysis model. Approximated penalized ML continuously and efficiently shrinks the factor loadings towards zero. It naturally factorizes a covariance matrix or a correlation matrix. It is also applicable to an orthogonal or an oblique structure. Paper II, a simulation study, investigates the properties of approximated penalized ML with an orthogonal factor model. Different combinations of penalty terms and tuning parameter selection methods are examined. Differences in factorizing a covariance matrix and factorizing a correlation matrix are also explored. It is shown that the approximated penalized ML frequently improves the traditional estimation-rotation procedure. In Paper III we focus on pseudo ML for multi-group data. Data from different groups are pooled and normal theory is used to fit the model. It is shown that pseudo ML produces consistent estimators of factor loadings and that it is numerically easier than multi-group ML. In addition, normal theory is not applicable to estimate standard errors. A sandwich-type estimator of standard errors is derived. Paper IV examines properties of the recently proposed polychoric instrumental variable (PIV) estimators for ordinal data through a simulation study. PIV is compared with conventional estimation methods (unweighted least squares and diagonally weighted least squares). PIV produces accurate estimates of factor loadings and factor covariances in the correctly specified confirmatory factor analysis model and accurate estimates of loadings and coefficient matrices in the correctly specified structure equation model. If the model is misspecified, robustness of PIV depends on model complexity, underlying distribution, and instrumental variables. shrinkage factor rotation penalized maximum likelihood pseudo-maximum likelihood multi-group analysis ordinal data robustness
4	High-dimensional inference of ordinal data with medical applications Jiao, Feiran 01 May 2016 (has links) Ordinal response variables abound in scientific and quantitative analyses, whose outcomes comprise a few categorical values that admit a natural ordering, so that their values are often represented by non-negative integers, for instance, pain score (0-10) or disease severity (0-4) in medical research. Ordinal variables differ from rational variables in that its values delineate qualitative rather than quantitative differences. In this thesis, we develop new statistical methods for variable selection in a high-dimensional cumulative link regression model with an ordinal response. Our study is partly motivated by the needs for exploring the association structure between disease phenotype and high-dimensional medical covariates. The cumulative link regression model specifies that the ordinal response of interest results from an order-preserving quantization of some latent continuous variable that bears a linear regression relationship with a set of covariates. Commonly used error distributions in the latent regression include the normal distribution, the logistic distribution, the Cauchy distribution and the standard Gumbel distribution (minimum). The cumulative link model with normal (logit, Gumbel) errors is also known as the ordered probit (logit, complementary log-log) model. While the likelihood function has a closed-form solution for the aforementioned error distributions, its strong nonlinearity renders direct optimization of the likelihood to sometimes fail. To mitigate this problem and to facilitate extension to penalized likelihood estimation, we proposed specific minorization-maximization (MM) algorithms for maximum likelihood estimation of a cumulative link model for each of the preceding 4 error distributions. Penalized ordinal regression models play a role when variable selection needs to be performed. In some applications, covariates may often be grouped according to some meaningful way but some groups may be mixed in that they contain both relevant and irrelevant variables, i.e., whose coefficients are non-zero and zero, respectively. Thus, it is pertinent to develop a consistent method for simultaneously selecting relevant groups and the relevant variables within each selected group, which constitutes the so-called bi-level selection problem. We have proposed to use a penalized maximum likelihood approach with a composite bridge penalty to solve the bi-level selection problem in a cumulative link model. An MM algorithm was developed for implementing the proposed method, which is specific to each of the 4 error distributions. The proposed approach is shown to enjoy a number of desirable theoretical properties including bi-level selection consistency and oracle properties, under suitable regularity conditions. Simulations demonstrate that the proposed method enjoys good empirical performance. We illustrated the proposed methods with several real medical applications. bi-level variable selection composite bridge penalty cumulative link model lung image data MM algorithm penalized maximum likelihood
5	Evaluating Bag Of Little Bootstraps On Logistic Regression With Unbalanced Data Bark, Henrik January 2023 (has links) The Bag of Little Bootstraps (BLB) was introduced to make the bootstrap method more computationally efficient when used on massive data samples. Since its introduction, a broad spectrum of research on the application of the BLB has been made. However, while the BLB has shown promising results that can be used for logistic regression, these results have been for well-balanced data. There is, therefore, an obvious need for further research into how the BLB performs when the dependent variable is unbalanced and whether possible performance issues can be remedied through methods such as Firths's Penalized Maximum Likelihood Estimation (PMLE). This thesis shows that the dependent variable's imbalances severely affect the BLB's performance when applied in logistic regression. Further, this thesis also shows that PMLE produces mixed and unreliable results when used to remedy the drops in performance. Bootstrapping Logistic Regression Bag of Little Bootstraps BLB Penalized Maximum Likelihood Estimation PMLE Probability Theory and Statistics Sannolikhetsteori och statistik

1

Page generated in 0.0599 seconds