• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 124
  • 20
  • 18
  • 16
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 213
  • 213
  • 76
  • 48
  • 43
  • 41
  • 40
  • 38
  • 35
  • 30
  • 28
  • 27
  • 24
  • 23
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

BLINDED EVALUATIONS OF EFFECT SIZES IN CLINICAL TRIALS: COMPARISONS BETWEEN BAYESIAN AND EM ANALYSES

Turkoz, Ibrahim January 2013 (has links)
Clinical trials are major and costly undertakings for researchers. Planning a clinical trial involves careful selection of the primary and secondary efficacy endpoints. The 2010 draft FDA guidance on adaptive designs acknowledges possible study design modifications, such as selection and/or order of secondary endpoints, in addition to sample size re-estimation. It is essential for the integrity of a double-blind clinical trial that individual treatment allocation of patients remains unknown. Methods have been proposed for re-estimating the sample size of clinical trials, without unblinding treatment arms, for both categorical and continuous outcomes. Procedures that allow a blinded estimation of the treatment effect, using knowledge of trial operational characteristics, have been suggested in the literature. Clinical trials are designed to evaluate effects of one or more treatments on multiple primary and secondary endpoints. The multiplicity issues when there is more than one endpoint require careful consideration for controlling the Type I error rate. A wide variety of multiplicity approaches are available to ensure that the probability of making a Type I error is controlled within acceptable pre-specified bounds. The widely used fixed sequence gate-keeping procedures require prospective ordering of null hypotheses for secondary endpoints. This prospective ordering is often based on a number of untested assumptions about expected treatment differences, the assumed population variance, and estimated dropout rates. We wish to update the ordering of the null hypotheses based on estimating standardized treatment effects. We show how to do so while the study is ongoing, without unblinding the treatments, without losing the validity of the testing procedure, and with maintaining the integrity of the trial. Our simulations show that we can reliably order the standardized treatment effect also known as signal-to-noise ratio, even though we are unable to estimate the unstandardized treatment effect. In order to estimate treatment difference in a blinded setting, we must define a latent variable substituting for the unknown treatment assignment. Approaches that employ the EM algorithm to estimate treatment differences in blinded settings do not provide reliable conclusions about ordering the null hypotheses. We developed Bayesian approaches that enable us to order secondary null hypotheses. These approaches are based on posterior estimation of signal-to-noise ratios. We demonstrate with simulation studies that our Bayesian algorithms perform better than existing EM algorithm counterparts for ordering effect sizes. Introducing informative priors for the latent variables, in settings where the EM algorithm has been used, typically improves the accuracy of parameter estimation in effect size ordering. We illustrate our method with a secondary analysis of a longitudinal study of depression. / Statistics
92

Bayesian Non-parametric Models for Time Series Decomposition

Granados-Garcia, Guilllermo 05 January 2023 (has links)
The standard approach to analyzing brain electrical activity is to examine the spectral density function (SDF) and identify frequency bands, defined apriori, that have the most substantial relative contributions to the overall variance of the signal. However, a limitation of this approach is that the precise frequency and bandwidth of oscillations are not uniform across cognitive demands. Thus, these bands should not be arbitrarily set in any analysis. To overcome this limitation, we propose three Bayesian Non-parametric models for time series decomposition which are data-driven approaches that identifies (i) the number of prominent spectral peaks, (ii) the frequency peak locations, and (iii) their corresponding bandwidths (or spread of power around the peaks). The standardized SDF is represented as a Dirichlet process mixture based on a kernel derived from second-order auto-regressive processes which completely characterize the location (peak) and scale (bandwidth) parameters. A Metropolis-Hastings within Gibbs algorithm is developed for sampling from the posterior distribution of the mixture parameters for each project. Simulation studies demonstrate the robustness and performance of the proposed methods. The methods developed were applied to analyze local field potential (LFP) activity from the hippocampus of laboratory rats across different conditions in a non-spatial sequence memory experiment to identify the most prominent frequency bands and examine the link between specific patterns of brain oscillatory activity and trial-specific cognitive demands. The second application study 61 EEG channels from two subjects performing a visual recognition task to discover frequency-specific oscillations present across brain zones. The third application extends the model to characterize the data coming from 10 alcoholics and 10 controls across three experimental conditions across 30 trials. The proposed models generate a framework to condense the oscillatory behavior of populations across different tasks isolating the target fundamental components allowing the practitioner different perspectives of analysis.
93

An Efficient Implementation of a Robust Clustering Algorithm

Blostein, Martin January 2016 (has links)
Clustering and classification are fundamental problems in statistical and machine learning, with a broad range of applications. A common approach is the Gaussian mixture model, which assumes that each cluster or class arises from a distinct Gaussian distribution. This thesis studies a robust, high-dimensional extension of the Gaussian mixture model that automatically detects outliers and noise, and a computationally efficient implementation thereof. The contaminated Gaussian distribution is a robust elliptic distribution that allows for automatic detection of ``bad points'', and is used to make robust the usual factor analysis model. In turn, the mixtures of contaminated Gaussian factor analyzers (MCGFA) algorithm allows high-dimesional, robust clustering, classification and detection of bad points. A family of MCGFA models is created through the introduction of different constraints on the covariance structure. A new, efficient implementation of the algorithm is presented, along with an account of its development. The fast implementation permits thorough testing of the MCGFA algorithm, and its performance is compared to two natural competitors: parsimonious Gaussian mixture models (PGMM) and mixtures of modified t factor analyzers (MMtFA). The algorithms are tested systematically on simulated and real data. / Thesis / Master of Science (MSc)
94

Mixture models for ROC curve and spatio-temporal clustering

Cheam, Amay SM January 2016 (has links)
Finite mixture models have had a profound impact on the history of statistics, contributing to modelling heterogeneous populations, generalizing distributional assumptions, and lately, presenting a convenient framework for classification and clustering. A novel approach, via Gaussian mixture distribution, is introduced for modelling receiver operating characteristic curves. The absence of a closed-form for a functional form leads to employing the Monte Carlo method. This approach performs excellently compared to the existing methods when applied to real data. In practice, the data are often non-normal, atypical, or skewed. It is apparent that non-Gaussian distributions be introduced in order to better fit these data. Two non-Gaussian mixtures, i.e., t distribution and skew t distribution, are proposed and applied to real data. A novel mixture is presented to cluster spatial and temporal data. The proposed model defines each mixture component as a mixture of autoregressive polynomial with logistic links. The new model performs significantly better compared to the most well known model-based clustering techniques when applied to real data. / Thesis / Doctor of Philosophy (PhD)
95

Bayesian Infinite Mixture Models for Gene Clustering and Simultaneous Context Selection Using High-Throughput Gene Expression Data

Freudenberg, Johannes M. January 2009 (has links)
No description available.
96

Computational Study of Calmodulin’s Ca2+-dependent Conformational Ensembles

Westerlund, Annie M. January 2018 (has links)
Ca2+ and calmodulin play important roles in many physiologically crucial pathways. The conformational landscape of calmodulin is intriguing. Conformational changes allow for binding target-proteins, while binding Ca2+ yields population shifts within the landscape. Thus, target-proteins become Ca2+-sensitive upon calmodulin binding. Calmodulin regulates more than 300 target-proteins, and mutations are linked to lethal disorders. The mechanisms underlying Ca2+ and target-protein binding are complex and pose interesting questions. Such questions are typically addressed with experiments which fail to provide simultaneous molecular and dynamics insights. In this thesis, questions on binding mechanisms are probed with molecular dynamics simulations together with tailored unsupervised learning and data analysis. In Paper 1, a free energy landscape estimator based on Gaussian mixture models with cross-validation was developed and used to evaluate the efficiency of regular molecular dynamics compared to temperature-enhanced molecular dynamics. This comparison revealed interesting properties of the free energy landscapes, highlighting different behaviors of the Ca2+-bound and unbound calmodulin conformational ensembles. In Paper 2, spectral clustering was used to shed light on Ca2+ and target protein binding. With these tools, it was possible to characterize differences in target-protein binding depending on Ca2+-state as well as N-terminal or C-terminal lobe binding. This work invites data-driven analysis into the field of biomolecule molecular dynamics, provides further insight into calmodulin’s Ca2+ and targetprotein binding, and serves as a stepping-stone towards a complete understanding of calmodulin’s Ca2+-dependent conformational ensembles. / <p>QC 20180912</p>
97

Extensions of D-Optimal Minimal Designs for Mixture Models

Li, Yanyan January 2014 (has links)
The purpose of mixture experiments is to explore the optimum blends of mixture components, which will provide desirable response characteristics in finished products. D-Optimal minimal designs have been considered for a variety of mixture models, including Scheffe's linear, quadratic, and cubic models. Usually, these D-Optimal designs are minimally supported since they have just as many design points as the number of parameters. Thus, they lack the degrees of freedom to perform the Lack of Fit tests. Also, the majority of the design points in D-Optimal minimal designs are on the boundary: vertices, edges, or faces of the design simplex. In this dissertation, extensions of the D-Optimal minimal designs are developed to allow additional interior points in the design space to enable prediction of the entire response surface. First, the extensions of the D-Optimal minimal designs for two commonly used second-degree mixture models are considered. Second, the methodology for adding interior points to general mixture models is generalized. Also a new strategy for adding multiple interior points for symmetric mixture models is proposed. When compared with the standard mixture designs, the proposed extended D-Optimal minimal design provides higher power for the Lack of Fit tests with comparable D-efficiency. / Statistics
98

Extending Growth Mixture Models and Handling Missing Values via Mixtures of Non-Elliptical Distributions

Wei, Yuhong January 2017 (has links)
Growth mixture models (GMMs) are used to model intra-individual change and inter-individual differences in change and to detect underlying group structure in longitudinal studies. Regularly, these models are fitted under the assumption of normality, an assumption that is frequently invalid. To this end, this thesis focuses on the development of novel non-elliptical growth mixture models to better fit real data. Two non-elliptical growth mixture models, via the multivariate skew-t distribution and the generalized hyperbolic distribution, are developed and applied to simulated and real data. Furthermore, these two non-elliptical growth mixture models are extended to accommodate missing values, which are near-ubiquitous in real data. Recently, finite mixtures of non-elliptical distributions have flourished and facilitated the flexible clustering of the data featuring longer tails and asymmetry. However, in practice, real data often have missing values, and so work in this direction is also pursued. A novel approach, via mixtures of the generalized hyperbolic distribution and mixtures of the multivariate skew-t distributions, is presented to handle missing values in mixture model-based clustering context. To increase parsimony, families of mixture models have been developed by imposing constraints on the component scale matrices whenever missing data occur. Next, a mixture of generalized hyperbolic factor analyzers model is also proposed to cluster high-dimensional data with different patterns of missing values. Two missingness indicator matrices are also introduced to ease the computational burden. The algorithms used for parameter estimation are presented, and the performance of the methods is illustrated on simulated and real data. / Thesis / Doctor of Philosophy (PhD)
99

Estimating Veterans' Health Benefit Grants Using the Generalized Linear Mixed Cluster-Weighted Model with Incomplete Data

Deng, Xiaoying January 2018 (has links)
The poverty rate among veterans in US has increased over the past decade, according to the U.S. Department of Veterans Affairs (2015). Thus, it is crucial to veterans who live below the poverty level to get sufficient benefit grants. A study on prudently managing health benefit grants for veterans may be helpful for government and policy-makers making appropriate decisions and investments. The purpose of this research is to find an underlying group structure for the veterans' benefit grants dataset and then estimate veterans' benefit grants sought using incomplete data. The generalized linear mixed cluster-weighted model based on mixture models is carried out by grouping similar observations to the same cluster. Finally, the estimates of veterans' benefit grants sought will provide reference for future public policies. / Thesis / Master of Science (MSc)
100

Variational Approximations and Other Topics in Mixture Models

Dang, Sanjeena 24 August 2012 (has links)
Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction almost fifty years ago. Families of mixture models are said to arise when the component parameters, usually the component covariance matrices, are decomposed and a number of constraints are imposed. Within the family setting, it is necessary to choose the member of the family --- i.e., the appropriate covariance structure --- in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for this model selection process, and the expectation-maximization (EM) algorithm has been predominantly used for parameter estimation. We deviate from the EM-BIC rubric, using variational Bayes approximations for parameter estimation and the deviance information criterion (DIC) for model selection. The variational Bayes approach alleviates some of the computational complexities associated with the EM algorithm. We use this approach on the most famous family of Gaussian mixture models known as Gaussian parsimonious clustering models (GPCM). These models have an eigen-decomposed covariance structure. Cluster-weighted modelling (CWM) is another flexible statistical framework for modelling local relationships in heterogeneous populations on the basis of weighted combinations of local models. In particular, we extend cluster-weighted models to include an underlying latent factor structure of the independent variable, resulting in a novel family of models known as parsimonious cluster-weighted factor analyzers. The EM-BIC rubric is utilized for parameter estimation and model selection. Some work on a mixture of multivariate t-distributions is also presented, with a linear model for the mean and a modified Cholesky-decomposed covariance structure leading to a novel family of mixture models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters are estimated using the EM algorithm and another approach to model selection other than the BIC is also considered. / NSERC PGS-D

Page generated in 0.0355 seconds