Global ETD Search

71	Clustering of temporal gene expression data with mixtures of mixed effects models Lu, Darlene 27 February 2019 (has links) While time-dependent processes are important to biological functions, methods to leverage temporal information from large data have remained computationally challenging. In temporal gene-expression data, clustering can be used to identify genes with shared function in complex processes. Algorithms like K-Means and standard Gaussian mixture-models (GMM) fail to account for variability in replicated data or repeated measures over time and require a priori cluster number assumptions, evaluating many cluster numbers to select an optimal result. An improved penalized-GMM offers a computationally-efficient algorithm to simultaneously optimize cluster number and labels. The work presented in this dissertation was motivated by mice bone-fracture models interested in determining patterns of temporal gene-expression during bone-healing progression. To solve this, an extension to the penalized-GMM was proposed to account for correlation between replicated data and repeated measures over time by introducing random-effects using a mixture of mixed-effects polynomial regression models and an entropy-penalized EM-Algorithm (EPEM). First, performance of EPEM for different mixed-effects models were assessed with simulation studies and applied to the fracture-healing study. Second, modifications to address the high computational cost of EPEM were considered that either clustered subsets of data determined by predicted polynomial-order (S-EPEM) or used modified-initialization to decrease the initial burden (I-EPEM). Each was compared to EPEM and applied to the fracture-healing study. Lastly, as varied rates of fracture-healing were observed for mice with different genetic-backgrounds (strains), a new analysis strategy was proposed to compare patterns of temporal gene-expression between different mice-strains and assessed with simulation studies. Expression-profiles for each strain were treated as separate objects to cluster in order to determine genes clustered into different groups across strain. We found that the addition of random-effects decreased accuracy of predicted cluster labels compared to K-Means, GMM, and fixed-effects EPEM. Polynomial-order optimization with BIC performed with highest accuracy, and optimization on subspaces obtained with singular-value-decomposition performed well. Computation time for S-EPEM was much reduced with a slight decrease in accuracy. I-EPEM was comparable to EPEM with similar accuracy and decrease in computation time. Application of the new analysis strategy on fracture-healing data identified several distinct temporal gene-expression patterns for the different strains. / 2021-02-27T00:00:00Z Biostatistics Clustering EM algorithm Gene expression Mixture model Model selection Polynomial regression
72	BLINDED EVALUATIONS OF EFFECT SIZES IN CLINICAL TRIALS: COMPARISONS BETWEEN BAYESIAN AND EM ANALYSES Turkoz, Ibrahim January 2013 (has links) Clinical trials are major and costly undertakings for researchers. Planning a clinical trial involves careful selection of the primary and secondary efficacy endpoints. The 2010 draft FDA guidance on adaptive designs acknowledges possible study design modifications, such as selection and/or order of secondary endpoints, in addition to sample size re-estimation. It is essential for the integrity of a double-blind clinical trial that individual treatment allocation of patients remains unknown. Methods have been proposed for re-estimating the sample size of clinical trials, without unblinding treatment arms, for both categorical and continuous outcomes. Procedures that allow a blinded estimation of the treatment effect, using knowledge of trial operational characteristics, have been suggested in the literature. Clinical trials are designed to evaluate effects of one or more treatments on multiple primary and secondary endpoints. The multiplicity issues when there is more than one endpoint require careful consideration for controlling the Type I error rate. A wide variety of multiplicity approaches are available to ensure that the probability of making a Type I error is controlled within acceptable pre-specified bounds. The widely used fixed sequence gate-keeping procedures require prospective ordering of null hypotheses for secondary endpoints. This prospective ordering is often based on a number of untested assumptions about expected treatment differences, the assumed population variance, and estimated dropout rates. We wish to update the ordering of the null hypotheses based on estimating standardized treatment effects. We show how to do so while the study is ongoing, without unblinding the treatments, without losing the validity of the testing procedure, and with maintaining the integrity of the trial. Our simulations show that we can reliably order the standardized treatment effect also known as signal-to-noise ratio, even though we are unable to estimate the unstandardized treatment effect. In order to estimate treatment difference in a blinded setting, we must define a latent variable substituting for the unknown treatment assignment. Approaches that employ the EM algorithm to estimate treatment differences in blinded settings do not provide reliable conclusions about ordering the null hypotheses. We developed Bayesian approaches that enable us to order secondary null hypotheses. These approaches are based on posterior estimation of signal-to-noise ratios. We demonstrate with simulation studies that our Bayesian algorithms perform better than existing EM algorithm counterparts for ordering effect sizes. Introducing informative priors for the latent variables, in settings where the EM algorithm has been used, typically improves the accuracy of parameter estimation in effect size ordering. We illustrate our method with a secondary analysis of a longitudinal study of depression. / Statistics Statistics Pharmaceutical Sciences Adaptive Designs Bayesian Mixture Models Blinded Evaluations Em Algorithm Mcmc Secondary Endpoints
73	Degradation Analysis for Heterogeneous Data Using Mixture Model Ji, Yizhen 13 June 2013 (has links) No description available. Industrial Engineering Bayesian hierarchical model Degradation analysis Mixture model EM algorithm
74	Support Vector Machines for Classification and Imputation Rogers, Spencer David 16 May 2012 (has links) (PDF) Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios using data from the 1997 National Labor Survey. support vector machines SVM imputation binary classification handwritten digit recognition EM algorithm NLSY97 Statistics and Probability
75	XPRIME-EM: Eliciting Expert Prior Information for Motif Exploration Using the Expectation-Maximization Algorithm Zhou, Wei 22 June 2012 (has links) (PDF) Understanding the possible mechanisms of gene transcription regulation is a primary challenge for current molecular biologists. Identifying transcription factor binding sites (TFBSs), also called DNA motifs, is an important step in understanding these mechanisms. Furthermore, many human diseases are attributed to mutations in TFBSs, which makes identifying those DNA motifs significant for disease treatment. Uncertainty and variations in specific nucleotides of TFBSs present difficulties for DNA motif searching. In this project, we present an algorithm, XPRIME-EM (Eliciting EXpert PRior Information for Motif Exploration using the Expectation-Maximization Algorithm), which can discover known and de novo (unknown) DNA motifs simultaneously from a collection of DNA sequences using a modified EM algorithm and describe the variation nature of DNA motifs using position specific weight matrix (PWM). XPRIME improves the efficiency of locating and describing motifs by prevent the overlap of multiple motifs, a phenomenon termed a phase shift, and generates stronger motifs by considering the correlations between nucleotides at different positions within each motif. Moreover, a Bayesian formulation of the XPRIME algorithm allows for the elicitation of prior information for motifs of interest from literature and experiments into motif searching. We are the first research team to incorporate human genome-wide nucleosome occupancy information into the PWM based DNA motif searching. DNA motif modified EM algorithm human nucleosome occupancy information Statistics and Probability
76	Mixture models for ROC curve and spatio-temporal clustering Cheam, Amay SM January 2016 (has links) Finite mixture models have had a profound impact on the history of statistics, contributing to modelling heterogeneous populations, generalizing distributional assumptions, and lately, presenting a convenient framework for classification and clustering. A novel approach, via Gaussian mixture distribution, is introduced for modelling receiver operating characteristic curves. The absence of a closed-form for a functional form leads to employing the Monte Carlo method. This approach performs excellently compared to the existing methods when applied to real data. In practice, the data are often non-normal, atypical, or skewed. It is apparent that non-Gaussian distributions be introduced in order to better fit these data. Two non-Gaussian mixtures, i.e., t distribution and skew t distribution, are proposed and applied to real data. A novel mixture is presented to cluster spatial and temporal data. The proposed model defines each mixture component as a mixture of autoregressive polynomial with logistic links. The new model performs significantly better compared to the most well known model-based clustering techniques when applied to real data. / Thesis / Doctor of Philosophy (PhD) Finite mixture models ROC curve Spatio-temporal data Functional data Model-based clustering EM algorithm
77	Hierarchical Statistical Models for Large Spatial Data in Uncertainty Quantification and Data Fusion Shi, Hongxiang January 2017 (has links) No description available. Statistics Spatial Hierarchical Modeling Nearest Neighbor Gaussian Process Spatial Random Effect Model EM Algorithm MCMC Algrotihm
78	STATISTICAL APPROACHES TO ANALYZE CENSORED DATA WITH MULTIPLE DETECTION LIMITS ZHONG, WEI January 2005 (has links) No description available. censored data multiple detection limits lognormal distribution EM algorithm meta-analysis permutation test
79	A distributed cooperative algorithm for localization in wireless sensor networks using Gaussian mixture modeling Chowdhury, Tashnim Jabir Shovon January 2016 (has links) No description available. Electrical Engineering Localization wireless sensor networks Gaussian mixture modeling EM algorithm
80	Statistical Inferences under a semiparametric finite mixture model Zhang, Shiju January 2005 (has links) No description available. Statistics biased sampling, EM algorithm, empirical likelihood, finite mixture model, goodness-of-fit test, partial likelihood

Search results