Global ETD Search

121	Inferência em modelos de mistura via algoritmo EM estocástico modificado / Inference on Mixture Models via Modified Stochastic EM Assis, Raul Caram de 02 June 2017 (has links) Apresentamos o tópico e a teoria de Modelos de Mistura de Distribuições, revendo aspectos teóricos e interpretações de tais misturas. Desenvolvemos a teoria dos modelos nos contextos de máxima verossimilhança e de inferência bayesiana. Abordamos métodos de agrupamento já existentes em ambos os contextos, com ênfase em dois métodos, o algoritmo EM estocástico no contexto de máxima verossimilhança e o Modelo de Mistura com Processos de Dirichlet no contexto bayesiano. Propomos um novo método, uma modificação do algoritmo EM Estocástico, que pode ser utilizado para estimar os parâmetros de uma mistura de componentes enquanto permite soluções com número distinto de grupos. / We present the topics and theory of Mixture Models in a context of maximum likelihood and Bayesian inferece. We approach clustering methods in both contexts, with emphasis on the stochastic EM algorithm and the Dirichlet Process Mixture Model. We propose a new method, a modified stochastic EM algorithm, which can be used to estimate the parameters of a mixture model and the number of components. Algoritmo EM Cadeia de Markov EM algorithm Gibbs sampling Gibbs Sampling Image segmentation Markov chain Mistura de Distribuições Mixture models Mixture of distributions Modelos de Mistura Segmentação de imagens
122	Análise dos resultados de ensaios de proficiência via modelos de regressão com variável explicativa aleatória / Analysis of proficiency tests results via regression models with random explanatory variable Montanari, Aline Othon 21 June 2004 (has links) Em um programa de ensaio de prociência (EP) conduzido pelo Grupo de Motores, um grupo de onze laboratórios da área de temperatura realizaram medições em cinco pontos da escala de um termopar. Neste trabalho, propomos um modelo de regressão com variável explicativa X (aleatória) representando o termopar padrão que denominaremos por artefato e a variável dependente Y representando as medições dos laboratórios. O procedimento para a realização da comparação é simples, ambos termopares são colocados no forno e as diferenças entre as medições são registradas. Para a análise dos dados, vamos trabalhar com a diferença entre a diferença das medições do equipamento do laboratório e o artefato, e o valor de referência (que é determinado por 2 laboratórios que pertencem a Rede Brasileira de Calibração (RBC)). O erro de medição tem variância determinada por calibração, isto é, conhecida. Assim, vamos encontrar aproximações para as estimativas de máxima verossimilhança para os parâmetros do modelo via algoritmo EM. Além disso, propomos uma estratégia para avaliar a consistência dos laboratórios participantes do programa de EP / In a program of proficiency assay, a group of eleven laboratories of the temperature area had carried through measurements in ¯ve points on the scale of the thermopair. In this work, we propose a regression model with a random explanatory variable representing the temperature measured by the standard thermopair, which will be called device. The procedure for the comparison accomplishment is as follows. The device and the laboratory\'s thermopair to be tested are placed in the oven and the difererences between the measurements are registered. For the analysis of the data, the response variable is the diference between those diference and the reference value, which is determined by two laboratories that belong to the Brazilian Net of Calibration (RBC). The measurement error has variance determined by calibration which is known. Therefore, we ¯and the maximum likelihood estimates for the parameters of the model via EM algorithm. We consider a strategy to establish the consistency of the participant laboratories of the program of proficiency assay Algoritmo EM Comparação interlaboratorial EM algorithm Ensaios de proficiência Estimação Estimation Incerteza de medição Interlaboratory comparisons Proficiency tests Random explanatory variable Uncertainty measure Variável explicativa aleatória
123	"Uma aplicação industrial de regressão binária com erros na variável explicativa" / "An industrial application of binary regression with errors-in-variable explanatory" Favari, Daniel Fernando de 22 June 2006 (has links) Neste trabalho, aplicamos um modelo de regressão binária com erros de medição na variável explicativa para analisar sistemas de medição do tipo atributo. Para isto, utilizamos o modelo logístico com erros na variável, para o qual obtemos as estimativas de máxima verossimilhança via o algoritmo EM e a matriz de informação de Fisher observada. Além disso, fizemos um estudo de simulação para compararmos o método analítico e os modelos logístico sem erros na variável (ingênuo) e logístico com erros na variável. Finalmente, aplicamos nossa metodologia para avaliarmos um sistema de medição passa/não passa da maior montadora de motores Diesel (MWM International). / In this work, we apply a study of binary regression model with errors-in-variable to analyze attributive measurement systems. For this, we use the logistic model with errors-in-variable to obtain parameter estimates of maximum likelihood through EM algorithm and the observed Fisher information matrix. In addition we do a simulation study to compare analytic method and the logistic model with and without measurement errors-in-variable. Finally, we apply our methodology to evaluate a attributive measurement system for the largest Diesel motor company of the world (MWM International). algoritmo EM analytic method binary regression Delta method EM algorithm erros de medição Fieller's theorem measurement errors-in-variable método analítico método Delta regressão binária teorema de Fieller
124	Modelos lineares generalizados mistos para dados longitudinais. / Generalized linear mixed models in longitudinal data. Costa, Silvano Cesar da 13 March 2003 (has links) Experimentos cujas variaveis respostas s~ ao proporcoes ou contagens, sao muito comuns nas diversas areas do conhecimento, principalmente na area agricola. Na analise desses experimentos, utiliza-se a teoria de modelos lineares generalizados, bastante difundida (McCullagh & Nelder, 1989; Demetrio, 2001), em que as respostas sao independentes. Caso a variancia estimada seja maior do que a esperada, estima-se o parametro de dispersao, incluindo-o no processo de estimaçao dos parametros. Quando a variavel resposta e observada ao longo do tempo, pode haver uma correlacao entre as observacoes e isso tem que ser levado em consideracao na estimacao dos parametros. Uma forma de se trabalhar essa correlacao e aplicando a metodologia de equacoes de estimacao generalizada (EEG), discutida por Liang & Zeger (1986), embora, neste caso, o interesse esteja nas estimativas dos efeitos fixos e a inclusao da matriz de correlacao de trabalho sirva para se obter um melhor ajuste. Uma outra alternativa e a inclusao, no preditor linear, de um efeito latente para captar variabilidades nao consideradas no modelo e que podem in uenciar nos resultados. No presente trabalho, usa-se uma forma combinada de efeito aleatorio e parametro de dispersao, incluidos conjuntamente na estimacao dos parametros. Essa metodologia e aplicada a um conjunto de dados obtidos de um experimento com camu-camu, com objetivo de se avaliarem quais os melhores metodos de enxertia e tipos de porta-enxertos que podem ser utilizados, atraves da proporcao de pegamentos da muda. Varios modelos sao ajustados, desde o modelo em parcelas subdivididas (supondo independencia), ate o modelo em que se considera o parametro de dispersao e efeito aleatorio conjuntamente. Ha evidencias de que o modelo em que se inclui o efeito aleatorio e o parametro de dispersao, conjuntamente, resultam em melhores estimativas dos parametros. Outro conjunto de dados longitudinais, com milho transgenico MON810, em que a variavel resposta e o numero de lagartas (Spodoptera frugiperda), e utilizado. Neste caso, devido ao excesso de respostas zero, emprega-se o modelo de regressao Poisson in acionado de zeros (ZIP), alem do modelo Poisson padrao, em que as observacoes sao consideradas independentes, e do modelo Poisson in acionado de zeros com efeito aleatorio. Os resultados mostram que o efeito aleatorio incluido no preditor foi nao significativo e, assim, o modelo adotado e o modelo de regressao Poisson in acionado de zeros. Os resultados foram obtidos usando-se os procedimentos NLMIXED, GENMOD e GPLOT do SAS - Statistical Analysis System, versao 8.2. / Experiments which response variables are proportions or counts are very common in several research areas, specially in the area of agriculture. The theory of generalized linear models, well difused (McCullagh & Nelder, 1989; Demetrio, 2001), is used for analyzing these experiments where the responses are independent. If the estimated variance is greater than the expected variance, the dispersion parameter is estimated including it on the parameter estimation process. When the response variable is observed over time a correlation among observations might occur and it should be taken into account in the parameter estimation. A way of dealing with this correlation is applying the methodology of generalized estimating equations (GEEs) discussed by Liang & Zeger (1986) although, in this case, the interest is on the estimates of the xed efect being the inclusion of a working correlation matrix useful to obtain more accurate estimates. Another alternative is the inclusion of a latent efect in the linear predictor to explain variabilities not considered in the model that might in uence the results. In this work the random efect and the dispersion parameter are combined and included together in the parameter estimation. Such methodology is applied to a data set obtained from an experiment realized with camu-camu to evaluate, through proportion of grafting well successful of seedling, which kind of grafting and understock are suitable to be used. Several models are fitted, since the split plot model (with independence assumption) up to the model where the dispersion parameter and the random efect are considered together. There is evidence that the model including the random efect and the dispersion parameter together, produce better estimates of the parameters. Another longitudinal data set used here comes from an experiment realized with the MON810 transgenic corn where the response variable is the number of caterpillars (Spodoptera frugiperda). In this case, due to the excessive number of zeros obtained, the zero in ated Poisson regression model (ZIP) is used in addition to the standard Poisson model, where observations are considered independent, and the zero in ated Poisson regression model with random efect. The results show that the random efect included in the linear predictor was not significant and, therefore, the adopted model is the zero in ated Poisson regression model. The results were obtained using the procedures NLMIXED, GENMOD and GPLOT available on SAS - Statistical Analysis System, version 8.2. análise de dados longitudinais binomial distribution distribuição binomial distribuição de poisson em algorithm generalized linear mixed models generalized linear models modelos lineares generalizados poisson distribution SAS (programa de computador)
125	EM algorithm for Markov chains observed via Gaussian noise and point process information: Theory and case studies Damian, Camilla, Eksi-Altay, Zehra, Frey, Rüdiger January 2018 (has links) (PDF) In this paper we study parameter estimation via the Expectation Maximization (EM) algorithm for a continuous-time hidden Markov model with diffusion and point process observation. Inference problems of this type arise for instance in credit risk modelling. A key step in the application of the EM algorithm is the derivation of finite-dimensional filters for the quantities that are needed in the E-Step of the algorithm. In this context we obtain exact, unnormalized and robust filters, and we discuss their numerical implementation. Moreover, we propose several goodness-of-fit tests for hidden Markov models with Gaussian noise and point process observation. We run an extensive simulation study to test speed and accuracy of our methodology. The paper closes with an application to credit risk: we estimate the parameters of a hidden Markov model for credit quality where the observations consist of rating transitions and credit spreads for US corporations. MSC 2010: 60G35; 62P05
126	Shluková analýza pro funkcionální data / Cluster analysis for functional data Zemanová, Barbora January 2012 (has links) In this work we deal with cluster analysis for functional data. Functional data contain a set of subjects that are characterized by repeated measurements of a variable. Based on these measurements we want to split the subjects into groups (clusters). The subjects in a single cluster should be similar and differ from subjects in the other clusters. The first approach we use is the reduction of data dimension followed by the clustering method K-means. The second approach is to use a finite mixture of normal linear mixed models. We estimate parameters of the model by maximum likelihood using the EM algorithm. Throughout the work we apply all described procedures to real meteorological data.
127	Inferência e diagnósticos em modelos assimétricos / Inference and diagnostics in asymmetric models Clécio da Silva Ferreira 20 March 2008 (has links) Este trabalho apresenta um estudo de inferência e diagnósticos em modelos assimétricos. A análise de influência é baseada na metodologia para modelos com dados incompletos, que é relacionada ao algoritmo EM (Zhu e Lee, 2001). Além dos modelos de regressão Normal Assimétrico (Azzalini, 1999) e t-Normal Assimétrico (Gómez, Venegas e Bolfarine, 2007) existentes, são desenvolvidas duas novas classes de modelos, denominados modelos de misturas de escala normal assimétricos (englobando as distribuições Normal, t-Normal, Slash, Normal-Contaminada e Exponencial-potência Assimétricas) e modelos lineares mistos robustos assimétricos, utilizando distribuições de misturas de escalas normais assimétricas para o efeito aleatório e distribuições de misturas de escalas para o erro aleatório. Para o modelo misto, a matriz de informação de Fisher observada é calculada utilizando a aproximação de Louis (1982) para dados incompletos. Para todos os modelos, algoritmos tipo EM são desenvolvidos de forma a fornecer uma solução numérica para os parâmetros dos modelos de regressão. Para cada modelo de regressão, medidas de bondade de ajuste são realizadas via inspeção visual do gráfico de envelope simulado. Para os modelos de misturas de escalas normais assimétricos, um estudo de robustez do algoritmo EM proposto é desenvolvido, determinando a eficácia dos estimadores apresentados. Aplicações dos modelos estudados são realizadas para os conjuntos de dados do Australian Institute of Sports (AIS), para o conjunto de dados sobre qualidade de vida de pacientes (mulheres) com câncer de mama, em um estudo realizado pelo Centro de Atenção Integral à Saúde da Mulher (CAISM) em conjunto com a Faculdade de Ciências Médicas, da Universidade Estadual de Campinas e para o conjunto de dados de colesterol de Framingham. / This work presents a study of inference and diagnostic in asymmetric models. The influence analysis is based in the methodology for models with incomplete data, that is related to the algorithm EM (Zhu and Lee, 2001). Beyond of the existing asymmetric normal (Azzalini, 1999) and t-Normal asymmetric (Gómez, Venegas and Bolfarine, 2007) regression models, are developed two new classes of models, namely asymmetric normal scale mixture models (embodying the asymmetric Normal, t-Normal, Slash, Contaminated-Normal and Power-Exponential distributions) and asymmetric robust linear mixed models, utilizing asymmetric normal scale mixture distributions for the random effect and normal scale mixture distributions for the random error. For the mixed model, the observed Fisher information matrix is calculated using the Louis\' (1982) approach for incomplete data. For all models, EM algorithms are developed, that provide a numeric solution for the parameters of the regression models. For each regression model, measures of goodness of fit are realized through visual inspection of the graphic of simulated envelope. For the asymmetric normal scale mixture models, a study of robustness of the proposed EM algorithm is developed to determine the efficacy of the presented estimators. Applications of the studied models are made for the data set of the Australian Institute of Sports (AIS), for the data set about quality of life of patients (women) with breast cancer, in a study made by Centro de Atenção Integral à Saúde da Mulher (CAISM) in conjoint with the Medical Sciences Faculty, of the Campinas State\'s University and for the data set of Framingham\'s cholesterol study. Algoritmo EM. Distribuições Normais Assimétricas Modelos Mistos Asymmetric Normal Distributions EM Algorithm. Mixed Models Scale Mixtures of Normal Distributions
128	Stochastic process analysis for Genomics and Dynamic Bayesian Networks inference. Lebre, Sophie 14 September 2007 (has links) (PDF) This thesis is dedicated to the development of statistical and computational methods for the analysis of DNA sequences and gene expression time series.<br /><br />First we study a parsimonious Markov model called Mixture Transition Distribution (MTD) model which is a mixture of Markovian transitions. The overly high number of constraints on the parameters of this model hampers the formulation of an analytical expression of the Maximum Likelihood Estimate (MLE). We propose to approach the MLE thanks to an EM algorithm. After comparing the performance of this algorithm to results from the litterature, we use it to evaluate the relevance of MTD modeling for bacteria DNA coding sequences in comparison with standard Markovian modeling.<br /><br />Then we propose two different approaches for genetic regulation network recovering. We model those genetic networks with Dynamic Bayesian Networks (DBNs) whose edges describe the dependency relationships between time-delayed genes expression. The aim is to estimate the topology of this graph despite the overly low number of repeated measurements compared with the number of observed genes. <br /><br />To face this problem of dimension, we first assume that the dependency relationships are homogeneous, that is the graph topology is constant across time. Then we propose to approximate this graph by considering partial order dependencies. The concept of partial order dependence graphs, already introduced for static and non directed graphs, is adapted and characterized for DBNs using the theory of graphical models. From these results, we develop a deterministic procedure for DBNs inference. <br /><br />Finally, we relax the homogeneity assumption by considering the succession of several homogeneous phases. We consider a multiple changepoint<br />regression model. Each changepoint indicates a change in the regression model parameters, which corresponds to the way an expression level depends on the others. Using reversible jump MCMC methods, we develop a stochastic algorithm which allows to simultaneously infer the changepoints location and the structure of the network within the phases delimited by the changepoints. <br /><br />Validation of those two approaches is carried out on both simulated and real data analysis. [MATH] Mathematics Time series Gene expression Genetic networks Network inference Dynamic Bayesian Networks DBN Changepoints detection Reversible jump MCMC Partial order dependence Mixture Transition Distribution MTD EM algorithm
129	A Note on the Generalization Performance of Kernel Classifiers with Margin Evgeniou, Theodoros, Pontil, Massimiliano 01 May 2000 (has links) We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived. AI MIT Artificial Intelligence missing data mixture models statistical learning EM algorithm neural networks kernel classifiers Support Vector Machine regularization networks statistical learning theory V-gamma dimension.
130	On perfect simulation and EM estimation Larson, Kajsa January 2010 (has links) Perfect simulation and the EM algorithm are the main topics in this thesis. In paper I, we present coupling from the past (CFTP) algorithms that generate perfectly distributed samples from the multi-type Widom--Rowlin-son (W--R) model and some generalizations of it. The classical W--R model is a point process in the plane or the space consisting of points of several different types. Points of different types are not allowed to be closer than some specified distance, whereas points of the same type can be arbitrary close. A stick-model and soft-core generalizations are also considered. Further, we generate samples without edge effects, and give a bound on sufficiently small intensities (of the points) for the algorithm to terminate. In paper II, we consider the forestry problem on how to estimate seedling dispersal distributions and effective plant fecundities from spatially data of adult trees and seedlings, when the origin of the seedlings are unknown. Traditional models for fecundities build on allometric assumptions, where the fecundity is related to some characteristic of the adult tree (e.g.\ diameter). However, the allometric assumptions are generally too restrictive and lead to nonrealistic estimates. Therefore we present a new model, the unrestricted fecundity (UF) model, which uses no allometric assumptions. We propose an EM algorithm to estimate the unknown parameters. Evaluations on real and simulated data indicates better performance for the UF model. In paper III, we propose EM algorithms to estimate the passage time distribution on a graph.Data is obtained by observing a flow only at the nodes -- what happens on the edges is unknown. Therefore the sample of passage times, i.e. the times it takes for the flow to stream between two neighbors, consists of right censored and uncensored observations where it sometimes is unknown which is which. For discrete passage time distributions, we show that the maximum likelihood (ML) estimate is strongly consistent under certain weak conditions. We also show that our propsed EM algorithm converges to the ML estimate if the sample size is sufficiently large and the starting value is sufficiently close to the true parameter. In a special case we show that it always converges. In the continuous case, we propose an EM algorithm for fitting phase-type distributions to data. Perfect simulation coupling from the past Markov chain Monte Carlo point process Widom-Rowlinson model EM algorithm dispersal distribution fecundity first-passage percolation Mathematical statistics Matematisk statistik

Search results