Spelling suggestions: "subject:"Em algorithm"" "subject:"Em allgorithm""
131 |
Comparison Of Missing Value Imputation Methods For Meteorological Time Series DataAslan, Sipan 01 September 2010 (has links) (PDF)
Dealing with missing data in spatio-temporal time series constitutes important branch of general missing data problem. Since the statistical properties of time-dependent data characterized by sequentiality of observations then any interruption of consecutiveness in time series will cause severe problems. In order to make reliable analyses in this case missing data must be handled cautiously without disturbing the series statistical properties, mainly as temporal and spatial dependencies.
In this study we aimed to compare several imputation methods for the appropriate completion of missing values of the spatio-temporal meteorological time series. For this purpose, several missing imputation methods are assessed on their imputation performances for artificially created missing data in monthly total precipitation and monthly mean temperature series which are obtained from the climate stations of Turkish State Meteorological Service. Artificially created missing data are estimated by using six methods. Single Arithmetic Average (SAA), Normal Ratio (NR) and NR Weighted with Correlations (NRWC) are the three simple methods used in the study. On the other hand, we used two computational intensive methods for missing data imputation which are called Multi Layer Perceptron type Neural Network (MLPNN) and Monte Carlo Markov Chain based on Expectation-Maximization Algorithm (EM-MCMC). In addition to these, we propose a modification in the EM-MCMC method in which results of simple imputation methods are used as auxiliary variables. Beside the using accuracy measure based on squared errors we proposed Correlation Dimension (CD) technique for appropriate evaluation of imputation performances which is also important subject of Nonlinear Dynamic Time Series Analysis.
|
132 |
Additive Latent Variable (ALV) Modeling: Assessing Variation in Intervention Impact in Randomized Field TrialsToyinbo, Peter Ayo 23 October 2009 (has links)
In order to personalize or tailor treatments to maximize impact among different
subgroups, there is need to model not only the main effects of intervention but also the variation
in intervention impact by baseline individual level risk characteristics. To this end a suitable
statistical model will allow researchers to answer a major research question: who benefits or is
harmed by this intervention program? Commonly in social and psychological research, the
baseline risk may be unobservable and have to be estimated from observed indicators that are
measured with errors; also it may have nonlinear relationship with the outcome. Most of the
existing nonlinear structural equation models (SEM’s) developed to address such problems
employ polynomial or fully parametric nonlinear functions to define the structural equations.
These methods are limited because they require functional forms to be specified beforehand and
even if the models include higher order polynomials there may be problems when the focus of
interest relates to the function over its whole domain.
To develop a more flexible statistical modeling technique for assessing complex
relationships between a proximal/distal outcome and 1) baseline characteristics measured with
errors, and 2) baseline-treatment interaction; such that the shapes of these relationships are data
driven and there is no need for the shapes to be determined a priori.
In the ALV model structure
the nonlinear components of the regression equations are represented as generalized additive
model (GAM), or generalized additive mixed-effects model (GAMM).
Replication study results show that the ALV model estimates of underlying relationships
in the data are sufficiently close to the true pattern. The ALV modeling technique allows
researchers to assess how an intervention affects individuals differently as a function of baseline
risk that is itself measured with error, and uncover complex relationships in the data that might
otherwise be missed. Although the ALV approach is computationally intensive, it relieves its
users from the need to decide functional forms before the model is run. It can be extended to
examine complex nonlinearity between growth factors and distal outcomes in a longitudinal
study.
|
133 |
Essays on Trade Agreements, Agricultural Commodity Prices and Unconditional Quantile RegressionLi, Na 03 January 2014 (has links)
My dissertation consists of three essays in three different areas: international trade; agricultural markets; and nonparametric econometrics. The first and third essays are theoretical papers, while the second essay is empirical. In the first essay, I developed a political economy model of trade agreements where the set of policy instruments are endogenously determined, providing a rationale for countervailing duties (CVDs). Trade-related policy intervention is assumed to be largely shaped in response to rent seeking demand as is often shown empirically. Consequently, the uncertain circumstance during the lifetime of a trade agreement involves both economic and rent seeking conditions. The latter approximates the actual trade policy decisions more closely than the externality hypothesis and thus provides scope for empirical testing. The second essay tests whether normal mixture (NM) generalized autoregressive conditional heteroscedasticity (GARCH) models adequately capture the relevant properties of agricultural commodity prices. Volatility series were constructed for ten agricultural commodity weekly cash prices. NM-GARCH models allow for heterogeneous volatility dynamics among different market regimes. Both in-sample fit and out-of-sample forecasting tests confirm that the two-state NM-GARCH approach performs significantly better than the traditional normal GARCH model. For each commodity, it is found that an expected negative price change corresponds to a higher volatility persistence, while an expected positive price change arises in conjunction with a greater responsiveness of volatility. In the third essay, I propose an estimator for a nonparametric additive unconditional quantile regression model. Unconditional quantile regression is able to assess the possible different impacts of covariates on different unconditional quantiles of a response variable. The proposed estimator does not require d-dimensional nonparametric regression and therefore has no curse of dimensionality. In addition, the estimator has an oracle property in the sense that the asymptotic distribution of each additive component is the same as the case when all other components are known. Both numerical simulations and an empirical application suggest that the new estimator performs much better than alternatives. / the Canadian Agricultural Trade Policy and Competitiveness Research Network, the Structure and Performance of Agriculture and Agri-products Industry Network, and the Institute for the Advanced Study of Food and Agricultural Policy.
|
134 |
Statistical Methods for Life History Analysis Involving Latent ProcessesShen, Hua January 2014 (has links)
Incomplete data often arise in the study of life history processes. Examples include missing responses, missing covariates, and unobservable latent processes in addition to right censoring. This thesis is on the development of statistical models and methods to address these problems as they arise in oncology and chronic disease. Methods of estimation and inference in parametric, weakly parametric and semiparametric settings are investigated.
Studies of chronic diseases routinely sample individuals subject to conditions on an event time of interest. In epidemiology, for example, prevalent cohort studies aiming to evaluate risk factors for survival following onset of dementia require subjects to have survived to the point of screening. In clinical trials designed to assess the effect of experimental cancer treatments on survival, patients are required to survive from the time of cancer diagnosis to recruitment. Such conditions yield samples featuring left-truncated event time distributions. Incomplete covariate data often arise in such settings, but standard methods do not deal with the fact that the covariate distribution is also affected by left truncation. We develop a likelihood and algorithm for estimation for dealing with incomplete covariate data in such settings. An expectation-maximization algorithm deals with the left truncation by using the covariate distribution conditional on the selection criterion. An extension to deal with sub-group analyses in clinical trials is described for the case in which the stratification variable is incompletely observed.
In studies of affective disorder, individuals are often observed to experience recurrent symptomatic exacerbations of symptoms warranting hospitalization. Interest lies in modeling the occurrence of such exacerbations over time and identifying associated risk factors to better understand the disease process. In some patients, recurrent exacerbations are temporally clustered following disease onset, but cease to occur after a period of time. We develop a dynamic mover-stayer model in which a canonical binary variable associated with each event indicates whether the underlying disease has resolved. An individual whose disease process has not resolved will experience events following a standard point process model governed by a latent intensity. If and when the disease process resolves, the complete data intensity becomes zero and no further events will arise. An expectation-maximization algorithm is developed for parametric and semiparametric model fitting based on a discrete time dynamic mover-stayer model and a latent intensity-based model of the underlying point process. The method is applied to a motivating dataset from a cohort of individuals with affective disorder experiencing recurrent hospitalization for their mental health disorder.
Interval-censored recurrent event data arise when the event of interest is not readily observed but the cumulative event count can be recorded at periodic assessment times. Extensions on model fitting techniques for the dynamic mover-stayer model are discussed and incorporate interval censoring. The likelihood and algorithm for estimation are developed for piecewise constant baseline rate functions and are shown to yield estimators with small empirical bias in simulation studies. Data on the cumulative number of damaged joints in patients with psoriatic arthritis are analysed to provide an illustrative application.
|
135 |
Inferência em modelos de mistura via algoritmo EM estocástico modificado / Inference on mixture models via modified stochastic EM algorithmAssis, Raul Caram de 02 June 2017 (has links)
Submitted by Ronildo Prado (ronisp@ufscar.br) on 2017-08-22T14:32:30Z
No. of bitstreams: 1
DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-22T14:32:38Z (GMT) No. of bitstreams: 1
DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-22T14:32:44Z (GMT) No. of bitstreams: 1
DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5) / Made available in DSpace on 2017-08-22T14:32:50Z (GMT). No. of bitstreams: 1
DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5)
Previous issue date: 2017-06-02 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / We present the topics and theory of Mixture Models in a context of maximum likelihood and Bayesian inferece. We approach clustering methods in both contexts, with emphasis on the stochastic EM algorithm and the Dirichlet Process Mixture Model. We propose a new method, a modified stochastic EM algorithm, which can be used to estimate the parameters of a mixture model and the number of components. / Apresentamos o tópico e a teoria de Modelos de Mistura de Distribuições, revendo aspectos teóricos e interpretações de tais misturas. Desenvolvemos a teoria dos modelos nos contextos de máxima verossimilhança e de inferência bayesiana. Abordamos métodos de agrupamento já existentes em ambos os contextos, com ênfase em dois métodos, o algoritmo EM estocástico no contexto de máxima verossimilhança e o Modelo de Mistura com Processos de Dirichlet no contexto bayesiano. Propomos um novo método, uma modificação do algoritmo EM Estocástico, que pode ser utilizado para estimar os parâmetros de uma mistura de componentes enquanto permite soluções com número distinto de grupos.
|
136 |
[en] MULTIPLE IMPUTATION IN MULTIVARIATE NORMAL DATA VIA A EM TYPE ALGORITHM / [pt] UM ALGORITMO - EM - PARA IMPUTAÇÃO MÚLTIPLA DE DADOS CENSURADOSFABIANO SALDANHA GOMES DE OLIVEIRA 05 July 2002 (has links)
[pt] Construímos um algoritmo tipo EM para estimar os parâmetros
por máxima verossimilhança. Os valores imputados são
calculados pela média condicional sujeito a ser
maior (ou menor) do que o valor observado. Como a estimação
é por máxima verossimilhança, a matriz de informação
permite o cálculo de intervalos de confiança para
os parâmetros e para os valores imputados. Fizemos
experiência com dados simulados e há também um estudo de
dados reais (onde na verdade a hipótese de normalidade não
se aplica). / [en] An EM algorithm was developed to parameter estimation of a
multivariate truncate normal distribution. The multiple
imputation is evaluated by the conditional expectation
becoming the estimated values greater or lower than the
observed value. The information matrix gives the confident
interval to the parameter and values estimations.
The proposed algorithm was tested with simulated and real
data (where the normality is not followed).
|
137 |
Modèles stochastiques des processus de rayonnement solaire / Stochastic models of solar radiation processesTran, Van Ly 12 December 2013 (has links)
Les caractéristiques des rayonnements solaires dépendent fortement de certains événements météorologiques non observés comme fréquence, taille et type des nuages et leurs propriétés optiques (aérosols atmosphériques, al- bédo du sol, vapeur d’eau, poussière et turbidité atmosphérique) tandis qu’une séquence du rayonnement solaire peut être observée et mesurée à une station donnée. Ceci nous a suggéré de modéliser les processus de rayonnement solaire (ou d’indice de clarté) en utilisant un modèle Markovien caché (HMM), paire corrélée de processus stochastiques. Notre modèle principal est un HMM à temps continu (Xt, yt)t_0 est tel que (yt), le processus observé de rayonnement, soit une solution de l’équation différentielle stochastique (EDS) : dyt = [g(Xt)It − yt]dt + _(Xt)ytdWt, où It est le rayonnement extraterrestre à l’instant t, (Wt) est un mouvement Brownien standard et g(Xt), _(Xt) sont des fonctions de la chaîne de Markov non observée (Xt) modélisant la dynamique des régimes environnementaux. Pour ajuster nos modèles aux données réelles observées, les procédures d’estimation utilisent l’algorithme EM et la méthode du changement de mesures par le théorème de Girsanov. Des équations de filtrage sont établies et les équations à temps continu sont approchées par des versions robustes. Les modèles ajustés sont appliqués à des fins de comparaison et classification de distributions et de prédiction. / Characteristics of solar radiation highly depend on some unobserved meteorological events such as frequency, height and type of the clouds and their optical properties (atmospheric aerosols, ground albedo, water vapor, dust and atmospheric turbidity) while a sequence of solar radiation can be observed and measured at a given station. This has suggested us to model solar radiation (or clearness index) processes using a hidden Markov model (HMM), a pair of correlated stochastic processes. Our main model is a continuous-time HMM (Xt, yt)t_0 is such that the solar radiation process (yt)t_0 is a solution of the stochastic differential equation (SDE) : dyt = [g(Xt)It − yt]dt + _(Xt)ytdWt, where It is the extraterrestrial radiation received at time t, (Wt) is a standard Brownian motion and g(Xt), _(Xt) are functions of the unobserved Markov chain (Xt) modelling environmental regimes. To fit our models to observed real data, the estimation procedures combine the Expectation Maximization (EM) algorithm and the measure change method due to Girsanov theorem. Filtering equations are derived and continuous-time equations are approximated by robust versions. The models are applied to pdf comparison and classification and prediction purposes.
|
138 |
Inferência em modelos de mistura via algoritmo EM estocástico modificado / Inference on Mixture Models via Modified Stochastic EMRaul Caram de Assis 02 June 2017 (has links)
Apresentamos o tópico e a teoria de Modelos de Mistura de Distribuições, revendo aspectos teóricos e interpretações de tais misturas. Desenvolvemos a teoria dos modelos nos contextos de máxima verossimilhança e de inferência bayesiana. Abordamos métodos de agrupamento já existentes em ambos os contextos, com ênfase em dois métodos, o algoritmo EM estocástico no contexto de máxima verossimilhança e o Modelo de Mistura com Processos de Dirichlet no contexto bayesiano. Propomos um novo método, uma modificação do algoritmo EM Estocástico, que pode ser utilizado para estimar os parâmetros de uma mistura de componentes enquanto permite soluções com número distinto de grupos. / We present the topics and theory of Mixture Models in a context of maximum likelihood and Bayesian inferece. We approach clustering methods in both contexts, with emphasis on the stochastic EM algorithm and the Dirichlet Process Mixture Model. We propose a new method, a modified stochastic EM algorithm, which can be used to estimate the parameters of a mixture model and the number of components.
|
139 |
Modelos de mistura para dados com distribuições Poisson truncadas no zero / Mixture models for data with zero truncated Poisson distributionsAndressa do Carmo Gigante 22 September 2017 (has links)
Modelo de mistura de distribuições tem sido utilizado desde longa data, mas ganhou maior atenção recentemente devido ao desenvolvimento de métodos de estimação mais eficientes. Nesta dissertação, o modelo de mistura foi utilizado como uma forma de agrupar ou segmentar dados para as distribuições Poisson e Poisson truncada no zero. Para solucionar o problema do truncamento foram estudadas duas abordagens. Na primeira, foi considerado o truncamento em cada componente da mistura, ou seja, a distribuição Poisson truncada no zero. E, alternativamente, o truncamento na resultante do modelo de mistura utilizando a distribuição Poisson usual. As estimativas dos parâmetros de interesse do modelo de mistura foram calculadas via metodologia de máxima verossimilhança, sendo necessária a utilização de um método iterativo. Dado isso, implementamos o algoritmo EM para estimar os parâmetros do modelo de mistura para as duas abordagens em estudo. Para analisar a performance dos algoritmos construídos elaboramos um estudo de simulação em que apresentaram estimativas próximas dos verdadeiros valores dos parâmetros de interesse. Aplicamos os algoritmos à uma base de dados real de uma determinada loja eletrônica e para determinar a escolha do melhor modelo utilizamos os critérios de seleção de modelos AIC e BIC. O truncamento no zero indica afetar mais a metodologia na qual aplicamos o truncamento em cada componente da mistura, tornando algumas estimativas para a distribuição Poisson truncada no zero com viés forte. Ao passo que, na abordagem em que empregamos o truncamento no zero diretamente no modelo as estimativas apontaram menor viés. / Mixture models has been used since long but just recently attracted more attention for the estimations methods development more efficient. In this dissertation, we consider the mixture model like a method for clustering or segmentation data with the Poisson and Poisson zero truncated distributions. About the zero truncation problem we have two emplacements. The first, consider the zero truncation in the mixture component, that is, we used the Poisson zero truncated distribution. And, alternatively, we do the zero truncation in the mixture model applying the usual Poisson. We estimated parameters of interest for the mixture model through maximum likelihood estimation method in which we need an iterative method. In this way, we implemented the EM algorithm for the estimation of interested parameters. We apply the algorithm in one real data base about one determined electronic store and towards determine the better model we use the criterion selection AIC and BIC. The zero truncation appear affect more the method which we truncated in the component mixture, return some estimates with strong bias. In the other hand, when we truncated the zero directly in the model the estimates pointed less bias.
|
140 |
Classification non supervisée et sélection de variables dans les modèles mixtes fonctionnels. Applications à la biologie moléculaire / Curve clustering and variable selection in mixed effects functional models. Applications to molecular biologyGiacofci, Joyce 22 October 2013 (has links)
Un nombre croissant de domaines scientifiques collectent de grandes quantités de données comportant beaucoup de mesures répétées pour chaque individu. Ce type de données peut être vu comme une extension des données longitudinales en grande dimension. Le cadre naturel pour modéliser ce type de données est alors celui des modèles mixtes fonctionnels. Nous traitons, dans une première partie, de la classification non-supervisée dans les modèles mixtes fonctionnels. Nous présentons dans ce cadre une nouvelle procédure utilisant une décomposition en ondelettes des effets fixes et des effets aléatoires. Notre approche se décompose en deux étapes : une étape de réduction de dimension basée sur les techniques de seuillage des ondelettes et une étape de classification où l'algorithme EM est utilisé pour l'estimation des paramètres par maximum de vraisemblance. Nous présentons des résultats de simulations et nous illustrons notre méthode sur des jeux de données issus de la biologie moléculaire (données omiques). Cette procédure est implémentée dans le package R "curvclust" disponible sur le site du CRAN. Dans une deuxième partie, nous nous intéressons aux questions d'estimation et de réduction de dimension au sein des modèles mixtes fonctionnels et nous développons en ce sens deux approches. La première approche se place dans un objectif d'estimation dans un contexte non-paramétrique et nous montrons dans ce cadre, que l'estimateur de l'effet fixe fonctionnel basé sur les techniques de seuillage par ondelettes possède de bonnes propriétés de convergence. Notre deuxième approche s'intéresse à la problématique de sélection des effets fixes et aléatoires et nous proposons une procédure basée sur les techniques de sélection de variables par maximum de vraisemblance pénalisée et utilisant deux pénalités SCAD sur les effets fixes et les variances des effets aléatoires. Nous montrons dans ce cadre que le critère considéré conduit à des estimateurs possédant des propriétés oraculaires dans un cadre où le nombre d'individus et la taille des signaux divergent. Une étude de simulation visant à appréhender les comportements des deux approches développées est réalisée dans ce contexte. / More and more scientific studies yield to the collection of a large amount of data that consist of sets of curves recorded on individuals. These data can be seen as an extension of longitudinal data in high dimension and are often modeled as functional data in a mixed-effects framework. In a first part we focus on performing unsupervised clustering of these curves in the presence of inter-individual variability. To this end, we develop a new procedure based on a wavelet representation of the model, for both fixed and random effects. Our approach follows two steps : a dimension reduction step, based on wavelet thresholding techniques, is first performed. Then a clustering step is applied on the selected coefficients. An EM-algorithm is used for maximum likelihood estimation of parameters. The properties of the overall procedure are validated by an extensive simulation study. We also illustrate our method on high throughput molecular data (omics data) like microarray CGH or mass spectrometry data. Our procedure is available through the R package "curvclust", available on the CRAN website. In a second part, we concentrate on estimation and dimension reduction issues in the mixed-effects functional framework. Two distinct approaches are developed according to these issues. The first approach deals with parameters estimation in a non parametrical setting. We demonstrate that the functional fixed effects estimator based on wavelet thresholding techniques achieves the expected rate of convergence toward the true function. The second approach is dedicated to the selection of both fixed and random effects. We propose a method based on a penalized likelihood criterion with SCAD penalties for the estimation and the selection of both fixed effects and random effects variances. In the context of variable selection we prove that the penalized estimators enjoy the oracle property when the signal size diverges with the sample size. A simulation study is carried out to assess the behaviour of the two proposed approaches.
|
Page generated in 0.0432 seconds