• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 89
  • 37
  • 23
  • 17
  • 9
  • 7
  • 7
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 210
  • 210
  • 68
  • 65
  • 62
  • 48
  • 39
  • 39
  • 37
  • 30
  • 29
  • 28
  • 27
  • 23
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

EM algorithm for Markov chains observed via Gaussian noise and point process information: Theory and case studies

Damian, Camilla, Eksi-Altay, Zehra, Frey, Rüdiger January 2018 (has links) (PDF)
In this paper we study parameter estimation via the Expectation Maximization (EM) algorithm for a continuous-time hidden Markov model with diffusion and point process observation. Inference problems of this type arise for instance in credit risk modelling. A key step in the application of the EM algorithm is the derivation of finite-dimensional filters for the quantities that are needed in the E-Step of the algorithm. In this context we obtain exact, unnormalized and robust filters, and we discuss their numerical implementation. Moreover, we propose several goodness-of-fit tests for hidden Markov models with Gaussian noise and point process observation. We run an extensive simulation study to test speed and accuracy of our methodology. The paper closes with an application to credit risk: we estimate the parameters of a hidden Markov model for credit quality where the observations consist of rating transitions and credit spreads for US corporations.
112

Shluková analýza pro funkcionální data / Cluster analysis for functional data

Zemanová, Barbora January 2012 (has links)
In this work we deal with cluster analysis for functional data. Functional data contain a set of subjects that are characterized by repeated measurements of a variable. Based on these measurements we want to split the subjects into groups (clusters). The subjects in a single cluster should be similar and differ from subjects in the other clusters. The first approach we use is the reduction of data dimension followed by the clustering method K-means. The second approach is to use a finite mixture of normal linear mixed models. We estimate parameters of the model by maximum likelihood using the EM algorithm. Throughout the work we apply all described procedures to real meteorological data.
113

Inferência e diagnósticos em modelos assimétricos / Inference and diagnostics in asymmetric models

Clécio da Silva Ferreira 20 March 2008 (has links)
Este trabalho apresenta um estudo de inferência e diagnósticos em modelos assimétricos. A análise de influência é baseada na metodologia para modelos com dados incompletos, que é relacionada ao algoritmo EM (Zhu e Lee, 2001). Além dos modelos de regressão Normal Assimétrico (Azzalini, 1999) e t-Normal Assimétrico (Gómez, Venegas e Bolfarine, 2007) existentes, são desenvolvidas duas novas classes de modelos, denominados modelos de misturas de escala normal assimétricos (englobando as distribuições Normal, t-Normal, Slash, Normal-Contaminada e Exponencial-potência Assimétricas) e modelos lineares mistos robustos assimétricos, utilizando distribuições de misturas de escalas normais assimétricas para o efeito aleatório e distribuições de misturas de escalas para o erro aleatório. Para o modelo misto, a matriz de informação de Fisher observada é calculada utilizando a aproximação de Louis (1982) para dados incompletos. Para todos os modelos, algoritmos tipo EM são desenvolvidos de forma a fornecer uma solução numérica para os parâmetros dos modelos de regressão. Para cada modelo de regressão, medidas de bondade de ajuste são realizadas via inspeção visual do gráfico de envelope simulado. Para os modelos de misturas de escalas normais assimétricos, um estudo de robustez do algoritmo EM proposto é desenvolvido, determinando a eficácia dos estimadores apresentados. Aplicações dos modelos estudados são realizadas para os conjuntos de dados do Australian Institute of Sports (AIS), para o conjunto de dados sobre qualidade de vida de pacientes (mulheres) com câncer de mama, em um estudo realizado pelo Centro de Atenção Integral à Saúde da Mulher (CAISM) em conjunto com a Faculdade de Ciências Médicas, da Universidade Estadual de Campinas e para o conjunto de dados de colesterol de Framingham. / This work presents a study of inference and diagnostic in asymmetric models. The influence analysis is based in the methodology for models with incomplete data, that is related to the algorithm EM (Zhu and Lee, 2001). Beyond of the existing asymmetric normal (Azzalini, 1999) and t-Normal asymmetric (Gómez, Venegas and Bolfarine, 2007) regression models, are developed two new classes of models, namely asymmetric normal scale mixture models (embodying the asymmetric Normal, t-Normal, Slash, Contaminated-Normal and Power-Exponential distributions) and asymmetric robust linear mixed models, utilizing asymmetric normal scale mixture distributions for the random effect and normal scale mixture distributions for the random error. For the mixed model, the observed Fisher information matrix is calculated using the Louis\' (1982) approach for incomplete data. For all models, EM algorithms are developed, that provide a numeric solution for the parameters of the regression models. For each regression model, measures of goodness of fit are realized through visual inspection of the graphic of simulated envelope. For the asymmetric normal scale mixture models, a study of robustness of the proposed EM algorithm is developed to determine the efficacy of the presented estimators. Applications of the studied models are made for the data set of the Australian Institute of Sports (AIS), for the data set about quality of life of patients (women) with breast cancer, in a study made by Centro de Atenção Integral à Saúde da Mulher (CAISM) in conjoint with the Medical Sciences Faculty, of the Campinas State\'s University and for the data set of Framingham\'s cholesterol study.
114

Stochastic process analysis for Genomics and Dynamic Bayesian Networks inference.

Lebre, Sophie 14 September 2007 (has links) (PDF)
This thesis is dedicated to the development of statistical and computational methods for the analysis of DNA sequences and gene expression time series.<br /><br />First we study a parsimonious Markov model called Mixture Transition Distribution (MTD) model which is a mixture of Markovian transitions. The overly high number of constraints on the parameters of this model hampers the formulation of an analytical expression of the Maximum Likelihood Estimate (MLE). We propose to approach the MLE thanks to an EM algorithm. After comparing the performance of this algorithm to results from the litterature, we use it to evaluate the relevance of MTD modeling for bacteria DNA coding sequences in comparison with standard Markovian modeling.<br /><br />Then we propose two different approaches for genetic regulation network recovering. We model those genetic networks with Dynamic Bayesian Networks (DBNs) whose edges describe the dependency relationships between time-delayed genes expression. The aim is to estimate the topology of this graph despite the overly low number of repeated measurements compared with the number of observed genes. <br /><br />To face this problem of dimension, we first assume that the dependency relationships are homogeneous, that is the graph topology is constant across time. Then we propose to approximate this graph by considering partial order dependencies. The concept of partial order dependence graphs, already introduced for static and non directed graphs, is adapted and characterized for DBNs using the theory of graphical models. From these results, we develop a deterministic procedure for DBNs inference. <br /><br />Finally, we relax the homogeneity assumption by considering the succession of several homogeneous phases. We consider a multiple changepoint<br />regression model. Each changepoint indicates a change in the regression model parameters, which corresponds to the way an expression level depends on the others. Using reversible jump MCMC methods, we develop a stochastic algorithm which allows to simultaneously infer the changepoints location and the structure of the network within the phases delimited by the changepoints. <br /><br />Validation of those two approaches is carried out on both simulated and real data analysis.
115

A Note on the Generalization Performance of Kernel Classifiers with Margin

Evgeniou, Theodoros, Pontil, Massimiliano 01 May 2000 (has links)
We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
116

On perfect simulation and EM estimation

Larson, Kajsa January 2010 (has links)
Perfect simulation  and the EM algorithm are the main topics in this thesis. In paper I, we present coupling from the past (CFTP) algorithms that generate perfectly distributed samples from the multi-type Widom--Rowlin-son (W--R) model and some generalizations of it. The classical W--R model is a point process in the plane or the  space consisting of points of several different types. Points of different types are not allowed to be closer than some specified distance, whereas points of the same type can be arbitrary close. A stick-model and soft-core generalizations are also considered. Further, we  generate samples without edge effects, and give a bound on sufficiently small intensities (of the points) for the algorithm to terminate. In paper II, we consider the  forestry problem on how to estimate  seedling dispersal distributions and effective plant fecundities from spatially data of adult trees  and seedlings, when the origin of the seedlings are unknown.   Traditional models for fecundities build on allometric assumptions, where the fecundity is related to some  characteristic of the adult tree (e.g.\ diameter). However, the allometric assumptions are generally too restrictive and lead to nonrealistic estimates. Therefore we present a new model, the unrestricted fecundity (UF) model, which uses no allometric assumptions. We propose an EM algorithm to estimate the unknown parameters.   Evaluations on real and simulated data indicates better performance for the UF model. In paper III, we propose  EM algorithms to  estimate the passage time distribution on a graph.Data is obtained by observing a flow only at the nodes -- what happens on the edges is unknown. Therefore the sample of passage times, i.e. the times it takes for the flow to stream between two neighbors, consists of right censored and uncensored observations where it sometimes is unknown which is which.       For discrete passage time distributions, we show that the maximum likelihood (ML) estimate is strongly consistent under certain  weak conditions. We also show that our propsed EM algorithm  converges to the ML estimate if the sample size is sufficiently large and the starting value is sufficiently close to the true parameter. In a special case we show that it always converges.  In the continuous case, we propose an EM algorithm for fitting  phase-type distributions to data.
117

Comparison Of Missing Value Imputation Methods For Meteorological Time Series Data

Aslan, Sipan 01 September 2010 (has links) (PDF)
Dealing with missing data in spatio-temporal time series constitutes important branch of general missing data problem. Since the statistical properties of time-dependent data characterized by sequentiality of observations then any interruption of consecutiveness in time series will cause severe problems. In order to make reliable analyses in this case missing data must be handled cautiously without disturbing the series statistical properties, mainly as temporal and spatial dependencies. In this study we aimed to compare several imputation methods for the appropriate completion of missing values of the spatio-temporal meteorological time series. For this purpose, several missing imputation methods are assessed on their imputation performances for artificially created missing data in monthly total precipitation and monthly mean temperature series which are obtained from the climate stations of Turkish State Meteorological Service. Artificially created missing data are estimated by using six methods. Single Arithmetic Average (SAA), Normal Ratio (NR) and NR Weighted with Correlations (NRWC) are the three simple methods used in the study. On the other hand, we used two computational intensive methods for missing data imputation which are called Multi Layer Perceptron type Neural Network (MLPNN) and Monte Carlo Markov Chain based on Expectation-Maximization Algorithm (EM-MCMC). In addition to these, we propose a modification in the EM-MCMC method in which results of simple imputation methods are used as auxiliary variables. Beside the using accuracy measure based on squared errors we proposed Correlation Dimension (CD) technique for appropriate evaluation of imputation performances which is also important subject of Nonlinear Dynamic Time Series Analysis.
118

Additive Latent Variable (ALV) Modeling: Assessing Variation in Intervention Impact in Randomized Field Trials

Toyinbo, Peter Ayo 23 October 2009 (has links)
In order to personalize or tailor treatments to maximize impact among different subgroups, there is need to model not only the main effects of intervention but also the variation in intervention impact by baseline individual level risk characteristics. To this end a suitable statistical model will allow researchers to answer a major research question: who benefits or is harmed by this intervention program? Commonly in social and psychological research, the baseline risk may be unobservable and have to be estimated from observed indicators that are measured with errors; also it may have nonlinear relationship with the outcome. Most of the existing nonlinear structural equation models (SEM’s) developed to address such problems employ polynomial or fully parametric nonlinear functions to define the structural equations. These methods are limited because they require functional forms to be specified beforehand and even if the models include higher order polynomials there may be problems when the focus of interest relates to the function over its whole domain. To develop a more flexible statistical modeling technique for assessing complex relationships between a proximal/distal outcome and 1) baseline characteristics measured with errors, and 2) baseline-treatment interaction; such that the shapes of these relationships are data driven and there is no need for the shapes to be determined a priori. In the ALV model structure the nonlinear components of the regression equations are represented as generalized additive model (GAM), or generalized additive mixed-effects model (GAMM). Replication study results show that the ALV model estimates of underlying relationships in the data are sufficiently close to the true pattern. The ALV modeling technique allows researchers to assess how an intervention affects individuals differently as a function of baseline risk that is itself measured with error, and uncover complex relationships in the data that might otherwise be missed. Although the ALV approach is computationally intensive, it relieves its users from the need to decide functional forms before the model is run. It can be extended to examine complex nonlinearity between growth factors and distal outcomes in a longitudinal study.
119

Essays on Trade Agreements, Agricultural Commodity Prices and Unconditional Quantile Regression

Li, Na 03 January 2014 (has links)
My dissertation consists of three essays in three different areas: international trade; agricultural markets; and nonparametric econometrics. The first and third essays are theoretical papers, while the second essay is empirical. In the first essay, I developed a political economy model of trade agreements where the set of policy instruments are endogenously determined, providing a rationale for countervailing duties (CVDs). Trade-related policy intervention is assumed to be largely shaped in response to rent seeking demand as is often shown empirically. Consequently, the uncertain circumstance during the lifetime of a trade agreement involves both economic and rent seeking conditions. The latter approximates the actual trade policy decisions more closely than the externality hypothesis and thus provides scope for empirical testing. The second essay tests whether normal mixture (NM) generalized autoregressive conditional heteroscedasticity (GARCH) models adequately capture the relevant properties of agricultural commodity prices. Volatility series were constructed for ten agricultural commodity weekly cash prices. NM-GARCH models allow for heterogeneous volatility dynamics among different market regimes. Both in-sample fit and out-of-sample forecasting tests confirm that the two-state NM-GARCH approach performs significantly better than the traditional normal GARCH model. For each commodity, it is found that an expected negative price change corresponds to a higher volatility persistence, while an expected positive price change arises in conjunction with a greater responsiveness of volatility. In the third essay, I propose an estimator for a nonparametric additive unconditional quantile regression model. Unconditional quantile regression is able to assess the possible different impacts of covariates on different unconditional quantiles of a response variable. The proposed estimator does not require d-dimensional nonparametric regression and therefore has no curse of dimensionality. In addition, the estimator has an oracle property in the sense that the asymptotic distribution of each additive component is the same as the case when all other components are known. Both numerical simulations and an empirical application suggest that the new estimator performs much better than alternatives. / the Canadian Agricultural Trade Policy and Competitiveness Research Network, the Structure and Performance of Agriculture and Agri-products Industry Network, and the Institute for the Advanced Study of Food and Agricultural Policy.
120

Statistical Methods for Life History Analysis Involving Latent Processes

Shen, Hua January 2014 (has links)
Incomplete data often arise in the study of life history processes. Examples include missing responses, missing covariates, and unobservable latent processes in addition to right censoring. This thesis is on the development of statistical models and methods to address these problems as they arise in oncology and chronic disease. Methods of estimation and inference in parametric, weakly parametric and semiparametric settings are investigated. Studies of chronic diseases routinely sample individuals subject to conditions on an event time of interest. In epidemiology, for example, prevalent cohort studies aiming to evaluate risk factors for survival following onset of dementia require subjects to have survived to the point of screening. In clinical trials designed to assess the effect of experimental cancer treatments on survival, patients are required to survive from the time of cancer diagnosis to recruitment. Such conditions yield samples featuring left-truncated event time distributions. Incomplete covariate data often arise in such settings, but standard methods do not deal with the fact that the covariate distribution is also affected by left truncation. We develop a likelihood and algorithm for estimation for dealing with incomplete covariate data in such settings. An expectation-maximization algorithm deals with the left truncation by using the covariate distribution conditional on the selection criterion. An extension to deal with sub-group analyses in clinical trials is described for the case in which the stratification variable is incompletely observed. In studies of affective disorder, individuals are often observed to experience recurrent symptomatic exacerbations of symptoms warranting hospitalization. Interest lies in modeling the occurrence of such exacerbations over time and identifying associated risk factors to better understand the disease process. In some patients, recurrent exacerbations are temporally clustered following disease onset, but cease to occur after a period of time. We develop a dynamic mover-stayer model in which a canonical binary variable associated with each event indicates whether the underlying disease has resolved. An individual whose disease process has not resolved will experience events following a standard point process model governed by a latent intensity. If and when the disease process resolves, the complete data intensity becomes zero and no further events will arise. An expectation-maximization algorithm is developed for parametric and semiparametric model fitting based on a discrete time dynamic mover-stayer model and a latent intensity-based model of the underlying point process. The method is applied to a motivating dataset from a cohort of individuals with affective disorder experiencing recurrent hospitalization for their mental health disorder. Interval-censored recurrent event data arise when the event of interest is not readily observed but the cumulative event count can be recorded at periodic assessment times. Extensions on model fitting techniques for the dynamic mover-stayer model are discussed and incorporate interval censoring. The likelihood and algorithm for estimation are developed for piecewise constant baseline rate functions and are shown to yield estimators with small empirical bias in simulation studies. Data on the cumulative number of damaged joints in patients with psoriatic arthritis are analysed to provide an illustrative application.

Page generated in 0.0701 seconds