Global ETD Search

81	Gaussian mixtures in R / Gaussian mixtures in R Marek, Petr January 2015 (has links) Using Gaussian mixtures is a popular and very flexible approach to statistical modelling. The standard approach of maximum likelihood estimation cannot be used for some of these models. The estimates are, however, obtainable by iterative solutions, such as the EM (Expectation-Maximization) algorithm. The aim of this thesis is to present Gaussian mixture models and their implementation in R. The non-trivial case of having to use the EM algorithm is assumed. Existing methods and packages are presented, investigated and compared. Some of them are extended by custom R code. Several exhaustive simulations are run and some of the interesting results are presented. For these simulations, a notion of usual fit is presented.
82	Bias in mixtures of normal distributions and joint modeling of longitudinal and time-to-event data with monotonic change curves Lourens, Spencer 01 May 2015 (has links) Estimating parameters in a mixture of normal distributions dates back to the 19th century when Pearson originally considered data of crabs from the Bay of Naples. Since then, many real world applications of mixtures have led to various proposed methods for studying similar problems. Among them, maximum likelihood estimation (MLE) and the continuous empirical characteristic function (CECF) methods have drawn the most attention. However, the performance of these competing estimation methods has not been thoroughly studied in the literature and conclusions have not been consistent in published research. In this article, we review this classical problem with a focus on estimation bias. An extensive simulation study is conducted to compare the estimation bias between the MLE and CECF methods over a wide range of disparity values. We use the overlapping coefficient (OVL) to measure the amount of disparity, and provide a practical guideline for estimation quality in mixtures of normal distributions. Application to an ongoing multi-site Huntington disease study is illustrated for ascertaining cognitive biomarkers of disease progression. We also study joint modeling of longitudinal and time-to-event data and discuss pattern-mixture and selection models, but focus on shared parameter models, which utilize unobserved random effects in order to "join" a marginal longitudinal data model and marginal survival model in order to assess an internal time-dependent covariate's effect on time-to-event. The marginal models used in the analysis are the Cox Proportional Hazards model and the Linear Mixed model, and both of these models are covered in some detail before defining joints models and describing the estimation process. Joint modeling provides a modeling framework which accounts for correlation between the longitudinal data and the time-to-event data, while also accounting for measurement error in the longitudinal process, which previous methods failed to do. Since it has been shown that bias is incurred, and this bias is proportional to the amount of measurement error, utilizing a joint modeling approach is preferred. Our setting is also complicated by monotone degeneration of the internal covariate considered, and so a joint model which utilizes monotone B-Splines to recover the longitudinal trajectory and a Cox Proportional Hazards (CPH) model for the time-to-event data is proposed. The monotonicity constraints are satisfied via the Projected Newton Raphson Algorithm as described by Cheng et al., 2012, with the baseline hazard profiled out of the $Q$ function in each M-step of the Expectation Maximization (EM) algorithm used for optimizing the observed likelihood. This method is applied to assess Total Motor Score's (TMS) ability to predict Huntington Disease motor diagnosis in the Biological Predictors of Huntington's Disease study (PREDICT-HD) data. publicabstract Constrained Optimization EM Algorithm Joint Modeling Mixtures Profile methods Survival Analysis Biostatistics
83	ON THE PREDICTIVE PERFORMANCE OF THE STOCK RETURNS BY USING THE MARKOV-SWITCHING MODELS Wu, Yanan January 2020 (has links) This paper proposes the basic predictive regression and Markov Regime-Switching regression to predict the excess stock returns in both US and Sweden stock markets. The analysis shows that the Markov Regime-Switching regression models out perform the linear ones in out-of-sample forecasting, which is due to the fact that the regime-switching models capture the economic expansion and recession better. stock returns regime-switching regression regime identify EM algorithm statistical method Social Sciences Samhällsvetenskap
84	Modeling IP traffic using the batch Markovian arrival process Klemm, Alexander, Lindemann, Christoph, Lohmann, Marco 10 December 2018 (has links) In this paper, we show how to utilize the expectation-maximization (EM) algorithm for efficient and numerical stable parameter estimation of the batch Markovian arrival process (BMAP). In fact, effective computational formulas for the E-step of the EM algorithm are presented, which utilize the well-known randomization technique and a stable calculation of Poisson jump probabilities. Moreover, we identify the BMAP as an analytically tractable model of choice for aggregated traffic modeling of IP networks. The key idea of this aggregated traffic model lies in customizing the BMAP such that different lengths of IP packets are represented by rewards of the BMAP. Using measured traffic data, a comparative study with the MMPP and the Poisson process illustrates the effectiveness of the customized BMAP for IP traffic modeling by visual inspection of sample paths over several time scales, by presenting important statistical properties as well as by investigations of queuing behavior.
85	Clustering of temporal gene expression data with mixtures of mixed effects models Lu, Darlene 27 February 2019 (has links) While time-dependent processes are important to biological functions, methods to leverage temporal information from large data have remained computationally challenging. In temporal gene-expression data, clustering can be used to identify genes with shared function in complex processes. Algorithms like K-Means and standard Gaussian mixture-models (GMM) fail to account for variability in replicated data or repeated measures over time and require a priori cluster number assumptions, evaluating many cluster numbers to select an optimal result. An improved penalized-GMM offers a computationally-efficient algorithm to simultaneously optimize cluster number and labels. The work presented in this dissertation was motivated by mice bone-fracture models interested in determining patterns of temporal gene-expression during bone-healing progression. To solve this, an extension to the penalized-GMM was proposed to account for correlation between replicated data and repeated measures over time by introducing random-effects using a mixture of mixed-effects polynomial regression models and an entropy-penalized EM-Algorithm (EPEM). First, performance of EPEM for different mixed-effects models were assessed with simulation studies and applied to the fracture-healing study. Second, modifications to address the high computational cost of EPEM were considered that either clustered subsets of data determined by predicted polynomial-order (S-EPEM) or used modified-initialization to decrease the initial burden (I-EPEM). Each was compared to EPEM and applied to the fracture-healing study. Lastly, as varied rates of fracture-healing were observed for mice with different genetic-backgrounds (strains), a new analysis strategy was proposed to compare patterns of temporal gene-expression between different mice-strains and assessed with simulation studies. Expression-profiles for each strain were treated as separate objects to cluster in order to determine genes clustered into different groups across strain. We found that the addition of random-effects decreased accuracy of predicted cluster labels compared to K-Means, GMM, and fixed-effects EPEM. Polynomial-order optimization with BIC performed with highest accuracy, and optimization on subspaces obtained with singular-value-decomposition performed well. Computation time for S-EPEM was much reduced with a slight decrease in accuracy. I-EPEM was comparable to EPEM with similar accuracy and decrease in computation time. Application of the new analysis strategy on fracture-healing data identified several distinct temporal gene-expression patterns for the different strains. / 2021-02-27T00:00:00Z Biostatistics Clustering EM algorithm Gene expression Mixture model Model selection Polynomial regression
86	BLINDED EVALUATIONS OF EFFECT SIZES IN CLINICAL TRIALS: COMPARISONS BETWEEN BAYESIAN AND EM ANALYSES Turkoz, Ibrahim January 2013 (has links) Clinical trials are major and costly undertakings for researchers. Planning a clinical trial involves careful selection of the primary and secondary efficacy endpoints. The 2010 draft FDA guidance on adaptive designs acknowledges possible study design modifications, such as selection and/or order of secondary endpoints, in addition to sample size re-estimation. It is essential for the integrity of a double-blind clinical trial that individual treatment allocation of patients remains unknown. Methods have been proposed for re-estimating the sample size of clinical trials, without unblinding treatment arms, for both categorical and continuous outcomes. Procedures that allow a blinded estimation of the treatment effect, using knowledge of trial operational characteristics, have been suggested in the literature. Clinical trials are designed to evaluate effects of one or more treatments on multiple primary and secondary endpoints. The multiplicity issues when there is more than one endpoint require careful consideration for controlling the Type I error rate. A wide variety of multiplicity approaches are available to ensure that the probability of making a Type I error is controlled within acceptable pre-specified bounds. The widely used fixed sequence gate-keeping procedures require prospective ordering of null hypotheses for secondary endpoints. This prospective ordering is often based on a number of untested assumptions about expected treatment differences, the assumed population variance, and estimated dropout rates. We wish to update the ordering of the null hypotheses based on estimating standardized treatment effects. We show how to do so while the study is ongoing, without unblinding the treatments, without losing the validity of the testing procedure, and with maintaining the integrity of the trial. Our simulations show that we can reliably order the standardized treatment effect also known as signal-to-noise ratio, even though we are unable to estimate the unstandardized treatment effect. In order to estimate treatment difference in a blinded setting, we must define a latent variable substituting for the unknown treatment assignment. Approaches that employ the EM algorithm to estimate treatment differences in blinded settings do not provide reliable conclusions about ordering the null hypotheses. We developed Bayesian approaches that enable us to order secondary null hypotheses. These approaches are based on posterior estimation of signal-to-noise ratios. We demonstrate with simulation studies that our Bayesian algorithms perform better than existing EM algorithm counterparts for ordering effect sizes. Introducing informative priors for the latent variables, in settings where the EM algorithm has been used, typically improves the accuracy of parameter estimation in effect size ordering. We illustrate our method with a secondary analysis of a longitudinal study of depression. / Statistics Statistics Pharmaceutical Sciences Adaptive Designs Bayesian Mixture Models Blinded Evaluations Em Algorithm Mcmc Secondary Endpoints
87	Degradation Analysis for Heterogeneous Data Using Mixture Model Ji, Yizhen 13 June 2013 (has links) No description available. Industrial Engineering Bayesian hierarchical model Degradation analysis Mixture model EM algorithm
88	Support Vector Machines for Classification and Imputation Rogers, Spencer David 16 May 2012 (has links) (PDF) Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios using data from the 1997 National Labor Survey. support vector machines SVM imputation binary classification handwritten digit recognition EM algorithm NLSY97 Statistics and Probability
89	XPRIME-EM: Eliciting Expert Prior Information for Motif Exploration Using the Expectation-Maximization Algorithm Zhou, Wei 22 June 2012 (has links) (PDF) Understanding the possible mechanisms of gene transcription regulation is a primary challenge for current molecular biologists. Identifying transcription factor binding sites (TFBSs), also called DNA motifs, is an important step in understanding these mechanisms. Furthermore, many human diseases are attributed to mutations in TFBSs, which makes identifying those DNA motifs significant for disease treatment. Uncertainty and variations in specific nucleotides of TFBSs present difficulties for DNA motif searching. In this project, we present an algorithm, XPRIME-EM (Eliciting EXpert PRior Information for Motif Exploration using the Expectation-Maximization Algorithm), which can discover known and de novo (unknown) DNA motifs simultaneously from a collection of DNA sequences using a modified EM algorithm and describe the variation nature of DNA motifs using position specific weight matrix (PWM). XPRIME improves the efficiency of locating and describing motifs by prevent the overlap of multiple motifs, a phenomenon termed a phase shift, and generates stronger motifs by considering the correlations between nucleotides at different positions within each motif. Moreover, a Bayesian formulation of the XPRIME algorithm allows for the elicitation of prior information for motifs of interest from literature and experiments into motif searching. We are the first research team to incorporate human genome-wide nucleosome occupancy information into the PWM based DNA motif searching. DNA motif modified EM algorithm human nucleosome occupancy information Statistics and Probability
90	Mixture models for ROC curve and spatio-temporal clustering Cheam, Amay SM January 2016 (has links) Finite mixture models have had a profound impact on the history of statistics, contributing to modelling heterogeneous populations, generalizing distributional assumptions, and lately, presenting a convenient framework for classification and clustering. A novel approach, via Gaussian mixture distribution, is introduced for modelling receiver operating characteristic curves. The absence of a closed-form for a functional form leads to employing the Monte Carlo method. This approach performs excellently compared to the existing methods when applied to real data. In practice, the data are often non-normal, atypical, or skewed. It is apparent that non-Gaussian distributions be introduced in order to better fit these data. Two non-Gaussian mixtures, i.e., t distribution and skew t distribution, are proposed and applied to real data. A novel mixture is presented to cluster spatial and temporal data. The proposed model defines each mixture component as a mixture of autoregressive polynomial with logistic links. The new model performs significantly better compared to the most well known model-based clustering techniques when applied to real data. / Thesis / Doctor of Philosophy (PhD) Finite mixture models ROC curve Spatio-temporal data Functional data Model-based clustering EM algorithm

Search results