Spelling suggestions: "subject:"expectationmaximization algorithms."" "subject:"expectationmaximisation algorithms.""
31 |
Misturas finitas de misturas de escala skew-normal / Mixtures modelling using scale mixtures of skew-normal distributionBasso, Rodrigo Marreiro 03 December 2009 (has links)
Orientador: Victor Hugo Lachos Davila / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-13T07:03:11Z (GMT). No. of bitstreams: 1
Basso_RodrigoMarreiro_M.pdf: 3130269 bytes, checksum: 85e95beb812a4ec069f39f8b9c79681a (MD5)
Previous issue date: 2009 / Resumo: Nesse trabalho será considerada uma classe flexível de modelos usando misturas finitas de distribuições da classe de misturas de escala skew-normal. O algoritmo EM é empregado para se obter estimativas de máxima verossimilhança de maneira iterativa, sendo discutido com maior ênfase para misturas de distribuições skew-normal, skew-t, skew-slash e skew-normal contaminada. Também será apresentado um método geral para aproximar a matrix de covariância assintótica das estimativas de máxima verossimilhança. Resultados obtidos da análise de quatro conjuntos de dados reais ilustram a aplicabilidade da metodologia proposta / Abstract: In this work we consider a flexible class of models using finite mixtures of multivariate scale mixtures of skew-normal distributions. An EM-type algorithm is employed for iteratively computing maximum likelihood estimates and this is discussed with emphasis on finite mixtures of skew-normal, skew-t, skew-slash and skew-contaminated normal distributions. A general information-based method for approximating the asymptotic covariance matrix of the maximum likelihood estimates is also presented. Results obtained from the analysis of four real data sets are reported illustrating the usefulness of the proposed methodology / Mestrado / Mestre em Estatística
|
32 |
Apprentissage à partir de données et de connaissances incertaines : application à la prédiction de la qualité du caoutchouc / Learning from uncertain data and knowledge : application to the natural rubber quality predictionSutton-Charani, Nicolas 28 May 2014 (has links)
Pour l’apprentissage de modèles prédictifs, la qualité des données disponibles joue un rôle important quant à la fiabilité des prédictions obtenues. Ces données d’apprentissage ont, en pratique, l’inconvénient d’être très souvent imparfaites ou incertaines (imprécises, bruitées, etc). Ce travail de doctorat s’inscrit dans ce cadre où la théorie des fonctions de croyance est utilisée de manière à adapter des outils statistiques classiques aux données incertaines.Le modèle prédictif choisi est l’arbre de décision qui est un classifieur basique de l’intelligence artificielle mais qui est habituellement construit à partir de données précises. Le but de la méthodologie principale développée dans cette thèse est de généraliser les arbres de décision aux données incertaines (floues, probabilistes,manquantes, etc) en entrée et en sortie. L’outil central d’extension des arbres de décision aux données incertaines est une vraisemblance adaptée aux fonctions de croyance récemment proposée dans la littérature dont certaines propriétés sont ici étudiées de manière approfondie. De manière à estimer les différents paramètres d’un arbre de décision, cette vraisemblance est maximisée via l’algorithme E2M qui étend l’algorithme EM aux fonctions de croyance. La nouvelle méthodologie ainsi présentée, les arbres de décision E2M, est ensuite appliquée à un cas réel : la prédiction de la qualité du caoutchouc naturel. Les données d’apprentissage, essentiellement culturales et climatiques, présentent de nombreuses incertitudes qui sont modélisées par des fonctions de croyance adaptées à ces imperfections. Après une étude statistique standard de ces données, des arbres de décision E2M sont construits et évalués en comparaison d’arbres de décision classiques. Cette prise en compte des incertitudes des données permet ainsi d’améliorer très légèrement la qualité de prédiction mais apporte surtout des informations concernant certaines variables peu prises en compte jusqu’ici par les experts du caoutchouc. / During the learning of predictive models, the quality of available data is essential for the reliability of obtained predictions. These learning data are, in practice very often imperfect or uncertain (imprecise, noised, etc). This PhD thesis is focused on this context where the theory of belief functions is used in order to adapt standard statistical tools to uncertain data.The chosen predictive model is decision trees which are basic classifiers in Artificial Intelligence initially conceived to be built from precise data. The aim of the main methodology developed in this thesis is to generalise decision trees to uncertain data (fuzzy, probabilistic, missing, etc) in input and in output. To realise this extension to uncertain data, the main tool is a likelihood adapted to belief functions,recently presented in the literature, whose behaviour is here studied. The maximisation of this likelihood provide estimators of the trees’ parameters. This maximisation is obtained via the E2M algorithm which is an extension of the EM algorithm to belief functions.The presented methodology, the E2M decision trees, is applied to a real case : the natural rubber quality prediction. The learning data, mainly cultural and climatic,contains many uncertainties which are modelled by belief functions adapted to those imperfections. After a simple descriptiv statistic study of the data, E2M decision trees are built, evaluated and compared to standard decision trees. The taken into account of the data uncertainty slightly improves the predictive accuracy but moreover, the importance of some variables, sparsely studied until now, is highlighted.
|
33 |
Modelos para dados censurados sob a classe de distribuições misturas de escala skew-normal / Censored regression models under the class of scale mixture of skew-normal distributionsMassuia, Monique Bettio, 1989- 03 June 2015 (has links)
Orientador: Víctor Hugo Lachos Dávila / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-26T19:55:07Z (GMT). No. of bitstreams: 1
Massuia_MoniqueBettio_M.pdf: 2926597 bytes, checksum: 2a1154c0a61b13f369e8390159fc4c3e (MD5)
Previous issue date: 2015 / Resumo: Este trabalho tem como objetivo principal apresentar os modelos de regressão lineares com respostas censuradas sob a classe de distribuições de mistura de escala skew-normal (SMSN), visando generalizar o clássico modelo Tobit ao oferecer alternativas mais robustas à distribuição Normal. Um estudo de inferência clássico é desenvolvido para os modelos em questão sob dois casos especiais desta família de distribuições, a Normal e a t de Student, utilizando o algoritmo EM para obter as estimativas de máxima verossimilhança dos parâmetros dos modelos e desenvolvendo métodos de diagnóstico de influência global e local com base na metodologia proposta por Cook (1986) e Poom & Poon (1999). Sob o enfoque Bayesiano, o modelo de regressão para respostas censuradas é estudado sob alguns casos especiais da classe SMSN, como a Normal, a t de Student, a skew-Normal, a skew-t e a skew-Slash. Neste caso, o amostrador de Gibbs é a principal ferramenta utilizada para a inferência sobre os parâmetros do modelo. Apresentamos também alguns estudos de simulação para avaliar a metodologia desenvolvida que, por fim, é aplicada em dois conjuntos de dados reais. Os pacotes SMNCensReg, CensRegMod e BayesCR para o software R dão suporte computacional aos desenvolvimentos deste trabalho / Abstract: This work aims to present the linear regression model with censored response variable under the class of scale mixture of skew-normal distributions (SMSN), generalizing the well known Tobit model as providing a more robust alternative to the normal distribution. A study based on classic inference is developed to investigate these censored models under two special cases of this family of distributions, Normal and t-Student, using the EM algorithm for obtaining maximum likelihood estimates and developing methods of diagnostic based on global and local influence as suggested by Cook (1986) and Poom & Poon (1999). Under a Bayesian approach, the censored regression model was studied under some special cases of SMSN class, such as Normal, t-Student, skew-Normal, skew-t and skew-Slash. In these cases, the Gibbs sampler was the main tool used to make inference about the model parameters. We also present some simulation studies for evaluating the developed methodologies that, finally, are applied on two real data sets. The packages SMNCensReg, CensRegMod and BayesCR implemented for the software R give computational support to this work / Mestrado / Estatistica / Mestra em Estatística
|
34 |
Estimation of prevalence on psychiatric mentally disorders on Shatin community.January 2001 (has links)
Leung Siu-Ngan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 72-74). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Structure and Contents of Data Sets --- p.6 / Chapter 2 --- Estimation of Prevalence of Mentally Disorders --- p.10 / Chapter 2.1 --- Likelihood Function Approach --- p.10 / Chapter 2.2 --- Maximum Likelihood Estimation via EM Algorithm --- p.13 / Chapter 2.3 --- The SEM Algorithm --- p.16 / Chapter 3 --- Estimation of Lifetime Comorbidity --- p.24 / Chapter 3.1 --- What is Comorbidity? --- p.24 / Chapter 3.2 --- Likelihood Function Approach --- p.25 / Chapter 3.2.1 --- Likelihood Function Model --- p.27 / Chapter 3.2.2 --- Maximum Likelihood Estimation via EM Algorithm --- p.28 / Chapter 3.2.3 --- Odds Ratio --- p.31 / Chapter 4 --- Logistic Regression --- p.35 / Chapter 4.1 --- Imputation Method of Missing Values --- p.35 / Chapter 4.1.1 --- Hot Deck Imputation --- p.35 / Chapter 4.1.2 --- A logistic Regression Imputation Model for Dichotomous Response --- p.40 / Chapter 4.2 --- Combining Results from Different Imputed Data Sets --- p.47 / Chapter 4.3 --- Itemization on Screening --- p.60 / Chapter 4.3.1 --- Methods of Weighting on the Screening Questions --- p.61 / Chapter 4.3.2 --- Statistical Analysis --- p.62 / Chapter 5 --- Conclusion and Discussion --- p.68 / Appendix: SRQ Questionnaire --- p.69 / Bibliography --- p.72
|
35 |
Modelos não lineares sob a classe de distribuições misturas da escala skew-normal / Nonlinear models based on scale mixtures skew-normal distributionsMedina Garay, Aldo William 07 August 2010 (has links)
Orientadores: Victor Hugo Lachos Dávila, Filidor Edilfonso Vilca Labra / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-16T04:06:26Z (GMT). No. of bitstreams: 1
MedinaGaray_AldoWilliam_M.pdf: 1389516 bytes, checksum: 2763869ea52e11ede3c860714ea0e75e (MD5)
Previous issue date: 2010 / Resumo: Neste trabalho estudamos alguns aspectos de estimação e diagnóstico de influência global e local de modelos não lineares sob a classe de distribuição misturas da escala skew-normal, baseado na metodologia proposta por Cook (1986) e Poon & Poon (1999). Os modelos não lineares heteroscedásticos também são discutidos. Esta nova classe de modelos constitui uma generalização robusta dos modelos de regressão não linear simétricos, que têm como membros particulares distribuições com caudas pesadas, tais como skew-t, skew-slash, skew-normal contaminada, entre outras. A estimação dos parâmetros será obtida via o algoritmo EM proposto por Dempster et al. (1977). Estudos de testes de hipóteses são considerados utilizando as estatísticas de escore e da razão de verossimilhança, para testar a homogeneidade do parâmetro de escala. Propriedades das estatísticas do teste são investigadas através de simulações de Monte Carlo. Exemplos numéricos considerando dados reais e simulados são apresentados para ilustrar a metodologia desenvolvida / Abstrac: In this work, we studied some aspects of estimation and diagnostics on the global and local influence in nonlinear models under the class of scale mixtures of the skewnormal (SMSN) distribution, based on the methodology proposed by Cook (1986) e Poon & Poon (1999). Heteroscedastic nonlinear models are also discussed. This new class of models are a robust generalization of non-linear regression symmetrical models, which have as members individual distributions with heavy tails, such as skew-t, skew-slash, and skew-contaminated normal, among others. The parameter estimation will be obtained with the EM algorithm proposed by Dempster et al. (1977). Studies testing hypotheses are considered using the score statistics and the likelihood ratio test to test the homogeneity of scale parameter. Properties of test statistics are investigated through Monte Carlo simulations. Numerical examples considering real and simulated data are presented to illustrate the methodology / Mestrado / Métodos Estatísticos / Mestre em Estatística
|
36 |
Distribuições misturas de escala skew-normal : estimação e diagnostico em modelos lineares / Scale mixtures of skew-normal distribuitions : estimation and diagnostics for linear modelsZeller, Camila Borelli 14 August 2018 (has links)
Orientadores: Filidor E. Vilca Labra, Victor Hugo Lachos Davila / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-14T22:06:24Z (GMT). No. of bitstreams: 1
Zeller_CamilaBorelli_D.pdf: 2738820 bytes, checksum: d40d3df77a4b5d44de0f48a8f8afed01 (MD5)
Previous issue date: 2009 / Resumo: Neste trabalho, estudamos alguns aspectos de estimação e diagnóstico de influência local (Cook, 1986) em modelos lineares, especificamente no modelo de regressão linear, no modelo linear misto e no modelo de Grubbs sob a classe de distribuições assimétricas misturas de escala skew-normal (SMSN) (Branco & Dey, 2001). Esta família de distribuições tem como membros particulares as versões simétrica e assimétrica das distribuições t-Student, slash e normal contaminada, todas com caudas mais pesadas que a distribuição normal, A estimação dos parâmetros será via o algoritmo EM (Dempster et al, 1977) e a análise de diagnóstico será baseada na técnica de dados aumentados que usa a esperança condicional da função log-verossimilhança dos dados aumentados (função-Q) proveniente do algoritmo EM, como proposta por Zhu & Lee (2001) e Lee & Xu (2004). Assim, pretendemos contribuir positivamente para desenvolvimento da área dos modelos lineares, estendendo alguns resultados encontrados na literatura, por exemplo, Pinheiro et al (2001), Arellano-Valle et aí (2005), Osório (2006), Montenegro et al (2009a), Montenegro et al (2009b), Osório et al (2009), Lachos et aí (2010), entre outros. / Abstract: In this work, we study some aspects of the estimation and the diagnostics based on the local influence (Cook, 1986) in linear models under the class of scale mixtures of the skew-normal (SMSN) distribution, as proposed by Branco & Dey (2001). Specifically, we consider the linear regression model, the linear mixed model and the Grubbs' measurement error model. The SMSN class of distributions provides a useful generalization of the normal and the skew-normal distributions since it covers both the asymmetric and heavy-tailed distributions such as the skew-t, the skew-slash, the skew-contaminated normal, among others. The local influence analysis will be based on the conditional expectation of the complete-data log-likelihood function (function-Q) from the EM algorithm (Dempster et al, 1977) ), as proposed by Zhu & Lee (2001) and Lee & Xu (2004). We believe that the results of our work have contributed positively to the development of this area of linear models, since we have extended some results from the works of Pinheiro et al. (2001), Arellano-Valle et al. (2005), Osorio (2006), Montenegro et al. (2009a), Montenegro et al. (2009b), Osorio et al. (2009), Lachos et al. (2010), among others. / Doutorado / Método Estatístico / Doutor em Estatística
|
37 |
Modelo de regressão linear mistura de escala normal com ponto de mudança : estimação e diagnóstico / Scale mixture of normal regression linear regression model with change point : estimation and diagnosticsHuaira Contreras, Carlos Alberto, 1971- 25 August 2018 (has links)
Orientador: Filidor Edilfonso Vilca Labra / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-25T19:08:47Z (GMT). No. of bitstreams: 1
HuairaContreras_CarlosAlberto_M.pdf: 2748699 bytes, checksum: fc8d02e2b19e638936faea1dec0b8ddc (MD5)
Previous issue date: 2014 / Resumo: Modelos lineares são frequentemente usados em estatística para descrever a relação entre uma variável resposta e uma ou mais variáveis explicativas, onde geralmente os erros são assumidos como normalmente distribuídos. Além disso, em modelos de regressão linear assume-se que o mesmo modelo linear é válido para todo o conjunto de dados. O modelo pode mudar após um ponto específico e assim um modelo linear com um ponto de mudança poderá ser apropriado para o conjunto de dados. O principal objetivo deste trabalho é estudar alguns aspectos de estimação e análise de diagnóstico em modelos de regressão linear com ponto de mudança sob distribuições de mistura de escala normal. A análise de diagnóstico é baseada nos trabalhos de Cook (1986) e Zhu & Lee (2001). Os resultados obtidos representam uma extensão de alguns resultados apresentados na literatura, ver por exemplo Chen (1998) e Osorio & Galea (2005). Finalmente, estudos de simulação através de simulações Monte Carlo são realizados e exemplos numéricos são apresentados para ilustrar os resultados propostos / Abstract: Linear models are widely used in statistics to describe the relationship between a response variable and one or more explanatory variables, where usually it is assumed the errors are normally distributed. Moreover, in linear regression model is assumed that the same linear model holds for the whole data set, but this is not always valid. The model may change after a specific point, and so a linear model with a change point would be appropriate for data set. The main objective of work is to study some aspect of estimation and analysis of diagnostics in the regression linear with change point model under scale mixture of normal distributions. The analysis of diagnostics is based on the works of Cook (1986) and Zhu & Lee (2001). The results obtained represent a extension of some results obtained in the literature; see for example Chen (1998) and Osorio & Galea (2005). Finally, simulation studies are investigated through Monte Carlo simulations and numerical examples are presented to illustrate the proposed results / Mestrado / Estatistica / Mestre em Estatística
|
38 |
Performance analysis of EM-MPM and K-means clustering in 3D ultrasound breast image segmentationYang, Huanyi 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Mammographic density is an important risk factor for breast cancer, detecting and screening at an early stage could help save lives. To analyze breast density distribution, a good segmentation algorithm is needed. In this thesis, we compared two popularly used segmentation algorithms, EM-MPM and K-means Clustering. We applied them on twenty cases of synthetic phantom ultrasound tomography (UST), and nine cases of clinical mammogram and UST images. From the synthetic phantom segmentation comparison we found that EM-MPM performs better than K-means Clustering on segmentation accuracy, because the segmentation result fits the ground truth data very well (with superior Tanimoto Coefficient and Parenchyma Percentage). The EM-MPM is able to use a Bayesian prior assumption, which takes advantage of the 3D structure and finds a better localized segmentation. EM-MPM performs significantly better for the highly dense tissue scattered within low density tissue and for volumes with low contrast between high and low density tissues. For the clinical mammogram, image segmentation comparison shows again that EM-MPM outperforms K-means Clustering since it identifies the dense tissue more clearly and accurately than K-means. The superior EM-MPM results shown in this study presents a promising future application to the density proportion and potential cancer risk evaluation.
|
39 |
Variable selection and structural discovery in joint models of longitudinal and survival dataHe, Zangdong January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Joint models of longitudinal and survival outcomes have been used with increasing frequency in clinical investigations. Correct specification of fixed and random effects, as well as their functional forms is essential for practical data analysis. However, no existing methods have been developed to meet this need in a joint model setting. In this dissertation, I describe a penalized likelihood-based method with adaptive least absolute shrinkage and selection operator (ALASSO) penalty functions for model selection. By reparameterizing variance components through a Cholesky decomposition, I introduce a penalty function of group shrinkage; the penalized likelihood is approximated by Gaussian quadrature and optimized by an EM algorithm. The functional forms of the independent effects are determined through a procedure for structural discovery. Specifically, I first construct the model by penalized cubic B-spline and then decompose the B-spline to linear and nonlinear elements by spectral decomposition. The decomposition represents the model in a mixed-effects model format, and I then use the mixed-effects variable selection method to perform structural discovery. Simulation studies show excellent performance. A clinical application is described to illustrate the use of the proposed methods, and the analytical results demonstrate the usefulness of the methods.
|
40 |
Multivariate semiparametric regression models for longitudinal dataLi, Zhuokai January 2014 (has links)
Multiple-outcome longitudinal data are abundant in clinical investigations. For example, infections with different pathogenic organisms are often tested concurrently, and assessments are usually taken repeatedly over time. It is therefore natural to consider a multivariate modeling approach to accommodate the underlying interrelationship among the multiple longitudinally measured outcomes. This dissertation proposes a multivariate semiparametric modeling framework for such data. Relevant estimation and inference procedures as well as model selection tools are discussed within this modeling framework. The first part of this research focuses on the analytical issues concerning binary data. The second part extends the binary model to a more general situation for data from the exponential family of distributions. The proposed model accounts for the correlations across the outcomes as well as the temporal dependency among the repeated measures of each outcome within an individual. An important feature of the proposed model is the addition of a bivariate smooth function for the depiction of concurrent nonlinear and possibly interacting influences of two independent variables on each outcome. For model implementation, a general approach for parameter estimation is developed by using the maximum penalized likelihood method. For statistical inference, a likelihood-based resampling procedure is proposed to compare the bivariate nonlinear effect surfaces across the outcomes. The final part of the dissertation presents a variable selection tool to facilitate model development in practical data analysis. Using the adaptive least absolute shrinkage and selection operator (LASSO) penalty, the variable selection tool simultaneously identifies important fixed effects and random effects, determines the correlation structure of the outcomes, and selects the interaction effects in the bivariate smooth functions. Model selection and estimation are performed through a two-stage procedure based on an expectation-maximization (EM) algorithm. Simulation studies are conducted to evaluate the performance of the proposed methods. The utility of the methods is demonstrated through several clinical applications.
|
Page generated in 0.1597 seconds