111 |
Models and estimation algorithms for nonparametric finite mixtures with conditionally independent multivariate component densities / Modèles et algorithmes d'estimation pour des mélanges finis de densités de composantes multivariées non paramétriques et conditionnellement indépendantesHoang, Vy-Thuy-Lynh 20 April 2017 (has links)
Plusieurs auteurs ont proposé récemment des modèles et des algorithmes pour l'estimation nonparamétrique de mélanges multivariés finis dont l'identifiabilité n'est pas toujours assurée. Entre les modèles considérés, l'hypothèse des coordonnées indépendantes conditionnelles à la sous-population de provenance des individus fait l'objet d'une attention croissante, en raison des développements théoriques et pratiques envisageables, particulièrement avec la multiplicité des variables qui entrent en jeu dans le framework statistique moderne. Dans ce travail, nous considérons d'abord un modèle plus général supposant l'indépendance, conditionnellement à la composante, de blocs multivariés de coordonnées au lieu de coordonnées univariées, permettant toute structure de dépendance à l'intérieur de ces blocs. Par conséquent, les fonctions de densité des blocs sont complètement multivariées et non paramétriques. Nous présentons des arguments d'identifiabilité et introduisons pour l'estimation dans ce modèle deux algorithmes méthodologiques dont les procédures de calcul ressemblent à un véritable algorithme EM mais incluent une étape additionnelle d'estimation de densité: un algorithme rapide montrant l'efficacité empirique sans justification théorique, et un algorithme lissé possédant une propriété de monotonie comme certain algorithme EM, mais plus exigeant en terme de calcul. Nous discutons également les méthodes efficaces en temps de calcul pour l'estimation et proposons quelques stratégies. Ensuite, nous considérons une extension multivariée des modèles de mélange utilisés dans le cadre de tests d'hypothèses multiples, permettant une nouvelle version multivariée de contrôle du False Discovery Rate. Nous proposons une version contrainte de notre algorithme précédent, adaptée spécialement à ce modèle. Le comportement des algorithmes de type EM que nous proposons est étudié numériquement dans plusieurs expérimentations de Monte Carlo et sur des données réelles de grande dimension et comparé avec les méthodes existantes dans la littérature. En n, les codes de nos nouveaux algorithmes sont progressivement ajoutés sous forme de nouvelles fonctions dans le package en libre accès mixtools pour le logiciel de statistique R. / Recently several authors have proposed models and estimation algorithms for finite nonparametric multivariate mixtures, whose identifiability is typically not obvious. Among the considered models, the assumption of independent coordinates conditional on the subpopulation from which each observation is drawn is subject of an increasing attention, in view of the theoretical and practical developments it allows, particularly with multiplicity of variables coming into play in the modern statistical framework. In this work we first consider a more general model assuming independence, conditional on the component, of multivariate blocks of coordinates instead of univariate coordinates, allowing for any dependence structure within these blocks. Consequently, the density functions of these blocks are completely multivariate and nonparametric. We present identifiability arguments and introduce for estimation in this model two methodological algorithms whose computational procedures resemble a true EM algorithm but include an additional density estimation step: a fast algorithm showing empirical efficiency without theoretical justification, and a smoothed algorithm possessing a monotony property as any EM algorithm does, but more computationally demanding. We also discuss computationally efficient methods for estimation and derive some strategies. Next, we consider a multivariate extension of the mixture models used in the framework of multiple hypothesis testings, allowing for a new multivariate version of the False Discovery Rate control. We propose a constrained version of our previous algorithm, specifically designed for this model. The behavior of the EM-type algorithms we propose is studied numerically through several Monte Carlo experiments and high dimensional real data, and compared with existing methods in the literature. Finally, the codes of our new algorithms are progressively implemented as new functions in the publicly-available package mixtools for the R statistical software.
|
112 |
Estimação de modelos afins por partes em espaço de estadosRui, Rafael January 2016 (has links)
Esta tese foca no problema de estimação de estado e de identificação de parâametros para modelos afins por partes. Modelos afins por partes são obtidos quando o domínio do estado ou da entrada do sistema e particionado em regiões e, para cada região, um submodelo linear ou afim e utilizado para descrever a dinâmica do sistema. Propomos um algoritmo para estimação recursiva de estados e um algoritmo de identificação de parâmetros para uma classe de modelos afins por partes. Propomos um estimador de estados Bayesiano que utiliza o filtro de Kalman em cada um dos submodelos. Neste estimador, a função distribuição cumulativa e utilizada para calcular a distribuição a posteriori do estado assim como a probabilidade de cada submodelo. Já o método de identificação proposto utiliza o algoritmo EM (Expectation Maximization algorithm) para identificar os parâmetros do modelo. A função distribuição cumulativa e utilizada para calcular a probabilidade de cada submodelo a partir da medida do sistema. Em seguida, utilizamos o filtro de Kalman suavizado para estimar o estado e calcular uma função substituta da função likelihood. Tal função e então utilizada para identificar os parâmetros do modelo. O estimador proposto foi utilizado para estimar o estado do modelo não linear para vibrações causadas por folgas. Foram realizadas simulações, onde comparamos o método proposto ao filtro de Kalman estendido e o filtro de partículas. O algoritmo de identificação foi utilizado para identificar os parâmetros do modelo do jato JAS 39 Gripen, assim como, o modelos não linear de vibrações causadas por folgas. / This thesis focuses on the state estimation and parameter identi cation problems of piecewise a ne models. Piecewise a ne models are obtained when the state domain or the input domain are partitioned into regions and, for each region, a linear or a ne submodel is used to describe the system dynamics. We propose a recursive state estimation algorithm and a parameter identi cation algorithm to a class of piecewise a ne models. We propose a Bayesian state estimate which uses the Kalman lter in each submodel. In the this estimator, the cumulative distribution is used to compute the posterior distribution of the state as well as the probability of each submodel. On the other hand, the proposed identi cation method uses the Expectation Maximization (EM) algorithm to identify the model parameters. We use the cumulative distribution to compute the probability of each submodel based on the system measurements. Subsequently, we use the Kalman smoother to estimate the state and compute a surrogate function for the likelihood function. This function is used to estimate the model parameters. The proposed estimator was used to estimate the state of the nonlinear model for vibrations caused by clearances. Numerical simulations were performed, where we have compared the proposed method to the extended Kalman lter and the particle lter. The identi cation algorithm was used to identify the model parameters of the JAS 39 Gripen aircraft as well as the nonlinear model for vibrations caused by clearances.
|
113 |
Estimação de modelos afins por partes em espaço de estadosRui, Rafael January 2016 (has links)
Esta tese foca no problema de estimação de estado e de identificação de parâametros para modelos afins por partes. Modelos afins por partes são obtidos quando o domínio do estado ou da entrada do sistema e particionado em regiões e, para cada região, um submodelo linear ou afim e utilizado para descrever a dinâmica do sistema. Propomos um algoritmo para estimação recursiva de estados e um algoritmo de identificação de parâmetros para uma classe de modelos afins por partes. Propomos um estimador de estados Bayesiano que utiliza o filtro de Kalman em cada um dos submodelos. Neste estimador, a função distribuição cumulativa e utilizada para calcular a distribuição a posteriori do estado assim como a probabilidade de cada submodelo. Já o método de identificação proposto utiliza o algoritmo EM (Expectation Maximization algorithm) para identificar os parâmetros do modelo. A função distribuição cumulativa e utilizada para calcular a probabilidade de cada submodelo a partir da medida do sistema. Em seguida, utilizamos o filtro de Kalman suavizado para estimar o estado e calcular uma função substituta da função likelihood. Tal função e então utilizada para identificar os parâmetros do modelo. O estimador proposto foi utilizado para estimar o estado do modelo não linear para vibrações causadas por folgas. Foram realizadas simulações, onde comparamos o método proposto ao filtro de Kalman estendido e o filtro de partículas. O algoritmo de identificação foi utilizado para identificar os parâmetros do modelo do jato JAS 39 Gripen, assim como, o modelos não linear de vibrações causadas por folgas. / This thesis focuses on the state estimation and parameter identi cation problems of piecewise a ne models. Piecewise a ne models are obtained when the state domain or the input domain are partitioned into regions and, for each region, a linear or a ne submodel is used to describe the system dynamics. We propose a recursive state estimation algorithm and a parameter identi cation algorithm to a class of piecewise a ne models. We propose a Bayesian state estimate which uses the Kalman lter in each submodel. In the this estimator, the cumulative distribution is used to compute the posterior distribution of the state as well as the probability of each submodel. On the other hand, the proposed identi cation method uses the Expectation Maximization (EM) algorithm to identify the model parameters. We use the cumulative distribution to compute the probability of each submodel based on the system measurements. Subsequently, we use the Kalman smoother to estimate the state and compute a surrogate function for the likelihood function. This function is used to estimate the model parameters. The proposed estimator was used to estimate the state of the nonlinear model for vibrations caused by clearances. Numerical simulations were performed, where we have compared the proposed method to the extended Kalman lter and the particle lter. The identi cation algorithm was used to identify the model parameters of the JAS 39 Gripen aircraft as well as the nonlinear model for vibrations caused by clearances.
|
114 |
NIG distribution in modelling stock returns with assumption about stochastic volatility : Estimation of parameters and application to VaR and ETLKucharska, Magdalena, Pielaszkiewicz, Jolanta Maria January 2009 (has links)
We model Normal Inverse Gaussian distributed log-returns with the assumption of stochastic volatility. We consider different methods of parametrization of returns and following the paper of Lindberg, [21] we assume that the volatility is a linear function of the number of trades. In addition to the Lindberg’s paper, we suggest daily stock volumes and amounts as alternative measures of the volatility. As an application of the models, we perform Value-at-Risk and Expected Tail Loss predictions by the Lindberg’s volatility model and by our own suggested model. These applications are new and not described in the literature. For better understanding of our caluclations, programmes and simulations, basic informations and properties about the Normal Inverse Gaussian and Inverse Gaussian distributions are provided. Practical applications of the models are implemented on the Nasdaq-OMX, where we have calculated Value-at-Risk and Expected Tail Loss for the Ericsson B stock data during the period 1999 to 2004.
|
115 |
Statistické úlohy pro Markovské procesy se spojitým časem / Statistical inference for Markov processes with continuous timeKřepinská, Dana January 2014 (has links)
Tato diplomová práce se zabývá odhadováním matice intenzit Markovova pro- cesu se spojitým časem na základě diskrétně pozorovaných dat. Začátek práce je věnován jednoduššímu odhadu ze spojité trajektorie pomocí metody maximální věrohodnosti. Dále je zde popsán odhad z diskrétní trajektorie přes výpočet ma- tice pravděpodobností přechodu. Následně je velmi podrobně rozebrán EM al- goritmus, který předchozí odhad zpřesňuje. Na závěr teoretické části je uvedena metoda odhadu zvaná Monte Carlo Markov Chain. Všechny postupy jsou zároveň implementovány v počítačovém softwaru a prezentace jejich výsledk· je obsahem druhé části práce. V té jsou porovnané odhady pro denní, týdenní a měsíční po- zorování a také pro pětiletou a desetiletou pozorovanou trajektorii. K výsledk·m jsou připojeny odhady rozptyl· a intervaly spolehlivosti. 1
|
116 |
Methods and algorithms to learn spatio-temporal changes from longitudinal manifold-valued observations / Méthodes et algorithmes pour l’apprentissage de modèles d'évolution spatio-temporels à partir de données longitudinales sur une variétéSchiratti, Jean-Baptiste 23 January 2017 (has links)
Dans ce manuscrit, nous présentons un modèle à effets mixtes, présenté dans un cadre Bayésien, permettant d'estimer la progression temporelle d'un phénomène biologique à partir d'observations répétées, à valeurs dans une variété Riemannienne, et obtenues pour un individu ou groupe d'individus. La progression est modélisée par des trajectoires continues dans l'espace des observations, que l'on suppose être une variété Riemannienne. La trajectoire moyenne est définie par les effets mixtes du modèle. Pour définir les trajectoires de progression individuelles, nous avons introduit la notion de variation parallèle d'une courbe sur une variété Riemannienne. Pour chaque individu, une trajectoire individuelle est construite en considérant une variation parallèle de la trajectoire moyenne et en reparamétrisant en temps cette parallèle. Les transformations spatio-temporelles sujet-spécifiques, que sont la variation parallèle et la reparamétrisation temporelle sont définnies par les effets aléatoires du modèle et permettent de quantifier les changements de direction et vitesse à laquelle les trajectoires sont parcourues. Le cadre de la géométrie Riemannienne permet d'utiliser ce modèle générique avec n'importe quel type de données définies par des contraintes lisses. Une version stochastique de l'algorithme EM, le Monte Carlo Markov Chains Stochastic Approximation EM (MCMC-SAEM), est utilisé pour estimer les paramètres du modèle au sens du maximum a posteriori. L'utilisation du MCMC-SAEM avec un schéma numérique permettant de calculer le transport parallèle est discutée dans ce manuscrit. De plus, le modèle et le MCMC-SAEM sont validés sur des données synthétiques, ainsi qu'en grande dimension. Enfin, nous des résultats obtenus sur différents jeux de données liés à la santé. / We propose a generic Bayesian mixed-effects model to estimate the temporal progression of a biological phenomenon from manifold-valued observations obtained at multiple time points for an individual or group of individuals. The progression is modeled by continuous trajectories in the space of measurements, which is assumed to be a Riemannian manifold. The group-average trajectory is defined by the fixed effects of the model. To define the individual trajectories, we introduced the notion of « parallel variations » of a curve on a Riemannian manifold. For each individual, the individual trajectory is constructed by considering a parallel variation of the average trajectory and reparametrizing this parallel in time. The subject specific spatiotemporal transformations, namely parallel variation and time reparametrization, are defined by the individual random effects and allow to quantify the changes in direction and pace at which the trajectories are followed. The framework of Riemannian geometry allows the model to be used with any kind of measurements with smooth constraints. A stochastic version of the Expectation-Maximization algorithm, the Monte Carlo Markov Chains Stochastic Approximation EM algorithm (MCMC-SAEM), is used to produce produce maximum a posteriori estimates of the parameters. The use of the MCMC-SAEM together with a numerical scheme for the approximation of parallel transport is discussed. In addition to this, the method is validated on synthetic data and in high-dimensional settings. We also provide experimental results obtained on health data.
|
117 |
MULTI-STATE MODELS WITH MISSING COVARIATESLou, Wenjie 01 January 2016 (has links)
Multi-state models have been widely used to analyze longitudinal event history data obtained in medical studies. The tools and methods developed recently in this area require the complete observed datasets. While, in many applications measurements on certain components of the covariate vector are missing on some study subjects. In this dissertation, several likelihood-based methodologies were proposed to deal with datasets with different types of missing covariates efficiently when applying multi-state models.
Firstly, a maximum observed data likelihood method was proposed when the data has a univariate missing pattern and the missing covariate is a categorical variable. The construction of the observed data likelihood function is based on the model of a joint distribution of the response longitudinal event history data and the discrete covariate with missing values.
Secondly, we proposed a maximum simulated likelihood method to deal with the missing continuous covariate when applying multi-state models. The observed data likelihood function was approximated by using the Monte Carlo simulation method.
At last, an EM algorithm was used to deal with multiple missing covariates when estimating the parameters of multi-state model. The EM algorithm would be able to handle multiple missing discrete covariates in general missing pattern efficiently.
All the proposed methods are justified by simulation studies and applications to the datasets from the SMART project, a consortium of 11 different high-quality longitudinal studies of aging and cognition.
|
118 |
Functional clustering methods and marital fertility modellingArnqvist, Per January 2017 (has links)
This thesis consists of two parts.The first part considers further development of a model used for marital fertility, the Coale-Trussell's fertility model, which is based on age-specific fertility rates. A new model is suggested using individual fertility data and a waiting time after pregnancies. The model is named the waiting model and can be understood as an alternating renewal process with age-specific intensities. Due to the complicated form of the waiting model and the way data is presented, as given in the United Nation Demographic Year Book 1965, a normal approximation is suggested together with a normal approximation of the mean and variance of the number of births per summarized interval. A further refinement of the model was then introduced to allow for left truncated and censored individual data, summarized as table data. The waiting model suggested gives better understanding of marital fertility and by a simulation study it is shown that the waiting model outperforms the Coale-Trussell model when it comes to estimating the fertility intensity and to predict the mean and variance of the number of births for a population. The second part of the thesis focus on developing functional clustering methods.The methods are motivated by and applied to varved (annually laminated) sediment data from lake Kassj\"on in northern Sweden. The rich but complex information (with respect to climate) in the varves, including the shapes of the seasonal patterns, the varying varve thickness, and the non-linear sediment accumulation rates makes it non-trivial to cluster the varves. Functional representations, smoothing and alignment are functional data tools used to make the seasonal patterns comparable.Functional clustering is used to group the seasonal patterns into different types, which can be associated with different weather conditions. A new non-parametric functional clustering method is suggested, the Bagging Voronoi K-mediod Alignment algorithm, (BVKMA), which simultaneously clusters and aligns spatially dependent curves. BVKMA is used on the varved lake sediment, to infer on climate, defined as frequencies of different weather types, over longer time periods. Furthermore, a functional model-based clustering method is proposed that clusters subjects for which both functional data and covariates are observed, allowing different covariance structures in the different clusters. The model extends a model-based functional clustering method proposed by James and Suger (2003). An EM algorithm is derived to estimate the parameters of the model.
|
119 |
Inferência e diagnósticos em modelos assimétricos / Inference and diagnostics in asymmetric modelsFerreira, Clécio da Silva 20 March 2008 (has links)
Este trabalho apresenta um estudo de inferência e diagnósticos em modelos assimétricos. A análise de influência é baseada na metodologia para modelos com dados incompletos, que é relacionada ao algoritmo EM (Zhu e Lee, 2001). Além dos modelos de regressão Normal Assimétrico (Azzalini, 1999) e t-Normal Assimétrico (Gómez, Venegas e Bolfarine, 2007) existentes, são desenvolvidas duas novas classes de modelos, denominados modelos de misturas de escala normal assimétricos (englobando as distribuições Normal, t-Normal, Slash, Normal-Contaminada e Exponencial-potência Assimétricas) e modelos lineares mistos robustos assimétricos, utilizando distribuições de misturas de escalas normais assimétricas para o efeito aleatório e distribuições de misturas de escalas para o erro aleatório. Para o modelo misto, a matriz de informação de Fisher observada é calculada utilizando a aproximação de Louis (1982) para dados incompletos. Para todos os modelos, algoritmos tipo EM são desenvolvidos de forma a fornecer uma solução numérica para os parâmetros dos modelos de regressão. Para cada modelo de regressão, medidas de bondade de ajuste são realizadas via inspeção visual do gráfico de envelope simulado. Para os modelos de misturas de escalas normais assimétricos, um estudo de robustez do algoritmo EM proposto é desenvolvido, determinando a eficácia dos estimadores apresentados. Aplicações dos modelos estudados são realizadas para os conjuntos de dados do Australian Institute of Sports (AIS), para o conjunto de dados sobre qualidade de vida de pacientes (mulheres) com câncer de mama, em um estudo realizado pelo Centro de Atenção Integral à Saúde da Mulher (CAISM) em conjunto com a Faculdade de Ciências Médicas, da Universidade Estadual de Campinas e para o conjunto de dados de colesterol de Framingham. / This work presents a study of inference and diagnostic in asymmetric models. The influence analysis is based in the methodology for models with incomplete data, that is related to the algorithm EM (Zhu and Lee, 2001). Beyond of the existing asymmetric normal (Azzalini, 1999) and t-Normal asymmetric (Gómez, Venegas and Bolfarine, 2007) regression models, are developed two new classes of models, namely asymmetric normal scale mixture models (embodying the asymmetric Normal, t-Normal, Slash, Contaminated-Normal and Power-Exponential distributions) and asymmetric robust linear mixed models, utilizing asymmetric normal scale mixture distributions for the random effect and normal scale mixture distributions for the random error. For the mixed model, the observed Fisher information matrix is calculated using the Louis\' (1982) approach for incomplete data. For all models, EM algorithms are developed, that provide a numeric solution for the parameters of the regression models. For each regression model, measures of goodness of fit are realized through visual inspection of the graphic of simulated envelope. For the asymmetric normal scale mixture models, a study of robustness of the proposed EM algorithm is developed to determine the efficacy of the presented estimators. Applications of the studied models are made for the data set of the Australian Institute of Sports (AIS), for the data set about quality of life of patients (women) with breast cancer, in a study made by Centro de Atenção Integral à Saúde da Mulher (CAISM) in conjoint with the Medical Sciences Faculty, of the Campinas State\'s University and for the data set of Framingham\'s cholesterol study.
|
120 |
Modelos de mistura para dados com distribuições Poisson truncadas no zero / Mixture models for data with zero truncated Poisson distributionsGigante, Andressa do Carmo 22 September 2017 (has links)
Modelo de mistura de distribuições tem sido utilizado desde longa data, mas ganhou maior atenção recentemente devido ao desenvolvimento de métodos de estimação mais eficientes. Nesta dissertação, o modelo de mistura foi utilizado como uma forma de agrupar ou segmentar dados para as distribuições Poisson e Poisson truncada no zero. Para solucionar o problema do truncamento foram estudadas duas abordagens. Na primeira, foi considerado o truncamento em cada componente da mistura, ou seja, a distribuição Poisson truncada no zero. E, alternativamente, o truncamento na resultante do modelo de mistura utilizando a distribuição Poisson usual. As estimativas dos parâmetros de interesse do modelo de mistura foram calculadas via metodologia de máxima verossimilhança, sendo necessária a utilização de um método iterativo. Dado isso, implementamos o algoritmo EM para estimar os parâmetros do modelo de mistura para as duas abordagens em estudo. Para analisar a performance dos algoritmos construídos elaboramos um estudo de simulação em que apresentaram estimativas próximas dos verdadeiros valores dos parâmetros de interesse. Aplicamos os algoritmos à uma base de dados real de uma determinada loja eletrônica e para determinar a escolha do melhor modelo utilizamos os critérios de seleção de modelos AIC e BIC. O truncamento no zero indica afetar mais a metodologia na qual aplicamos o truncamento em cada componente da mistura, tornando algumas estimativas para a distribuição Poisson truncada no zero com viés forte. Ao passo que, na abordagem em que empregamos o truncamento no zero diretamente no modelo as estimativas apontaram menor viés. / Mixture models has been used since long but just recently attracted more attention for the estimations methods development more efficient. In this dissertation, we consider the mixture model like a method for clustering or segmentation data with the Poisson and Poisson zero truncated distributions. About the zero truncation problem we have two emplacements. The first, consider the zero truncation in the mixture component, that is, we used the Poisson zero truncated distribution. And, alternatively, we do the zero truncation in the mixture model applying the usual Poisson. We estimated parameters of interest for the mixture model through maximum likelihood estimation method in which we need an iterative method. In this way, we implemented the EM algorithm for the estimation of interested parameters. We apply the algorithm in one real data base about one determined electronic store and towards determine the better model we use the criterion selection AIC and BIC. The zero truncation appear affect more the method which we truncated in the component mixture, return some estimates with strong bias. In the other hand, when we truncated the zero directly in the model the estimates pointed less bias.
|
Page generated in 0.0611 seconds