Global ETD Search

21	Bayesian Gaussian Graphical models using sparse selection priors and their mixtures Talluri, Rajesh 2011 August 1900 (has links) We propose Bayesian methods for estimating the precision matrix in Gaussian graphical models. The methods lead to sparse and adaptively shrunk estimators of the precision matrix, and thus conduct model selection and estimation simultaneously. Our methods are based on selection and shrinkage priors leading to parsimonious parameterization of the precision (inverse covariance) matrix, which is essential in several applications in learning relationships among the variables. In Chapter I, we employ the Laplace prior on the off-diagonal element of the precision matrix, which is similar to the lasso model in a regression context. This type of prior encourages sparsity while providing shrinkage estimates. Secondly we introduce a novel type of selection prior that develops a sparse structure of the precision matrix by making most of the elements exactly zero, ensuring positive-definiteness. In Chapter II we extend the above methods to perform classification. Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limits the potential of this technology is the lack of methods that allows for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological sample classification based on patterns of protein network activation, and provide insight into the distinct biological relationships underlying different cancers. We propose a Bayesian sparse graphical modeling approach motivated by RPPA data using selection priors on the conditional relationships in the presence of class information. We apply our methodology to an RPPA data set generated from panels of human breast cancer and ovarian cancer cell lines. We demonstrate that the model is able to distinguish the different cancer cell types more accurately than several existing models and to identify differential regulation of components of a critical signaling network (the PI3K-AKT pathway) between these cancers. This approach represents a powerful new tool that can be used to improve our understanding of protein networks in cancer. In Chapter III we extend these methods to mixtures of Gaussian graphical models for clustered data, with each mixture component being assumed Gaussian with an adaptive covariance structure. We model the data using Dirichlet processes and finite mixture models and discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalizing constants that are functions of parameters of interest which are a result of the restrictions on the correlation matrix. We evaluate the operating characteristics of our method via simulations, as well as discuss examples based on several real data sets. Bayesian Gaussian Graphical Models Covariance Selection Mixture Models
22	Local Mixture Model in Hilbert Space Zhiyue, Huang 26 January 2010 (has links) In this thesis, we study local mixture models with a Hilbert space structure. First, we consider the fibre bundle structure of local mixture models in a Hilbert space. Next, the spectral decomposition is introduced in order to construct local mixture models. We analyze the approximation error asymptotically in the Hilbert space. After that, we will discuss the convexity structure of local mixture models. There are two forms of convexity conditions to consider, first due to positivity in the $-1$-affine structure and the second by points having to lie inside the convex hull of a parametric family. It is shown that the set of mixture densities is located inside the intersection of the sets defined by these two convexities. Finally, we discuss the impact of the approximation error in the Hilbert space when the domain of mixing variable changes. Mixture Models Differential Geometry Convex Geometry Hilbert Space Statistics
23	Probabilistic Models for Genetic and Genomic Data with Missing Information Hicks, Stephanie 16 September 2013 (has links) Genetic and genomic data often contain unobservable or missing information. Applications of probabilistic models such as mixture models and hidden Markov models (HMMs) have been widely used since the 1960s to make inference on unobserved information using some observed information demonstrating the versatility and importance of these models. Biological applications of mixture models include gene expression data, meta-analysis, disease mapping, epidemiology and pharmacology and applications of HMMs include gene finding, linkage analysis, phylogenetic analysis and identifying regions of identity-by-descent. An important statistical and informatics challenge posed by modern genetics is to understand the functional consequences of genetic variation and its relation to phenotypic variation. In the analysis of whole-exome sequencing data, predicting the impact of missense mutations on protein function is an important factor in identifying and determining the clinical importance of disease susceptibility mutations in the absence of independent data determining impact on disease. In addition to the interpretation, identifying co-inherited regions of related individuals with Mendelian disorders can further narrow the search for disease susceptibility mutations. In this thesis, we develop two probabilistic models in application of genetic and genomic data with missing information: 1) a mixture model to estimate a posterior probability of functionality of missense mutations and 2) a HMM to identify co-inherited regions in the exomes of related individuals. The first application combines functional predictions from available computational or {\it in silico} methods which often have a high degree of disagreement leading to conflicting results for the user to assess the pathogenic impact of missense mutations on protein function. The second application considers extensions of a first-order HMM to include conditional emission probabilities varying as a function of minor allele frequency and a second-order dependence structure between observed variant calls. We apply these models to whole-exome sequencing data and show how these models can be used to identify disease susceptibility mutations. As disease-gene identification projects increasingly use next-generation sequencing, the probabilistic models developed in this thesis help identify and associate relevant disease-causing mutations with human disorders. The purpose of this thesis is to demonstrate that probabilistic models can contribute to more accurate and dependable inference based on genetic and genomic data with missing information. Statistics Statistical Genomics Bioinformatics Mixture Models Hidden Markov Models
24	Local Mixture Model in Hilbert Space Zhiyue, Huang 26 January 2010 (has links) In this thesis, we study local mixture models with a Hilbert space structure. First, we consider the fibre bundle structure of local mixture models in a Hilbert space. Next, the spectral decomposition is introduced in order to construct local mixture models. We analyze the approximation error asymptotically in the Hilbert space. After that, we will discuss the convexity structure of local mixture models. There are two forms of convexity conditions to consider, first due to positivity in the $-1$-affine structure and the second by points having to lie inside the convex hull of a parametric family. It is shown that the set of mixture densities is located inside the intersection of the sets defined by these two convexities. Finally, we discuss the impact of the approximation error in the Hilbert space when the domain of mixing variable changes. Mixture Models Differential Geometry Convex Geometry Hilbert Space Statistics
25	Structured Bayesian learning through mixture models PETRALIA, FRANCESCA January 2013 (has links) <p>In this thesis, we develop some Bayesian mixture density estimation for univariate and multivariate data. We start proposing a repulsive process favoring mixture components further apart. While conducting inferences on the cluster-specific parameters, current frequentist and Bayesian methods often encounter problems when clusters are placed too close together to be scientifically meaningful. Current Bayesian practice generates component-specific parameters independently from a common prior, which tends to favor similar components and often leads to substantial probability assigned to redundant components that are not needed to fit the data. As an alternative, we propose to generate components from a repulsive process, which leads to fewer, better separated and more interpretable clusters. </p><p>In the second part of the thesis, we face the problem of modeling the conditional distribution of a response variable given a high dimensional vector of predictors potentially concentrated near a lower dimensional subspace or manifold. In many settings it is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiresolution model that scales efficiently to massive numbers of features, and can be implemented efficiently with slice sampling.</p><p> In the third part of the thesis, we deal with the problem of characterizing the conditional density of a multivariate vector of response given a potentially high dimensional vector of predictors. The proposed model flexibly characterizes the density of the response variable by hierarchically coupling a collection of factor models, each one defined on a different scale of resolution. As it is illustrated in Chapter 4, our proposed method achieves good predictive performance compared to competitive models while efficiently scaling to high dimensional predictors.</p> / Dissertation Statistics Bayesian density estimation Bayesian Nonparametric Mixture Models
26	Model-based clustering of high-dimensional binary data Tang, Yang 05 September 2013 (has links) We present a mixture of latent trait models with common slope parameters (MCLT) for high dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a d-dimensional Gaussian latent variable, is extended by implementing common factor analyzers. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through block-specific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. The Bayesian information criterion is used to select the number of components and the covariance structure as well as the dimensions of latent variables. Our approach is demonstrated on U.S. Congressional voting data and on a data set describing the sensory properties of orange juice. Our examples show that our model performs well even when the number of observations is not very large relative to the data dimensionality. In both cases, our approach yields intuitive clustering results. Additionally, our dimensionality-reduction method allows data to be displayed in low-dimensional plots. / Early Researcher Award from the Government of Ontario (McNicholas); NSERC Discovery Grants (Browne and McNicholas).
27	Computational Methods for Comparative Analysis of Rare Cell Subsets in Flow Cytometry Frelinger, Jacob Jeffrey January 2013 (has links) <p>Automated analysis techniques for flow cytometry data can address many of the limitations of manual analysis by providing an objective approach for the identification of cellular subsets. While automated analysis has the potential to significantly improve automated analysis, challenges remain for automated methods in cross sample analysis for large scale studies. This thesis presents new methods for data normalization, sample enrichment for rare events of interest, and cell subset relabeling. These methods build upon and extend the use of Gaussian mixture models in automated flow cytometry analysis to enable practical large scale cell subset identification.</p> / Dissertation Bioinformatics Immunology Computer science automated analysis Flow Cytometry mixture models
28	A model-based frequency constraint for mining associations from transaction data Hahsler, Michael January 2004 (has links) (PDF) In this paper we develop an alternative to minimum support which utilizes knowledge of the process which generates transaction data and allows for highly skewed frequency distributions. We apply a simple stochastic model (the NB model), which is known for its usefulness to describe item occurrences in transaction data, to develop a frequency constraint. This model-based frequency constraint is used together with a precision threshold to find individual support thresholds for groups of associations. We develop the notion of NB-frequent itemsets and present two mining algorithms which find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint can provide significant improvements over a single minimum support threshold and that the precision threshold is easier to use. (author's abstract) / Series: Working Papers on Information Systems, Information Business and Operations
29	Assessment of a Credit Value atRisk for Corporate Credits Kremer, Laura January 2013 (has links) In this thesis I describe the essential steps of developing a credit rating system. This comprises the credit scoring process that assigns a credit score to each credit, the forming of rating classes by the k-means algorithm and the assignment of a probability of default (PD) for the rating classes. The main focus is on the PD estimation for which two approaches are presented. The first and simple approach in form of a calibration curve assumes independence of the defaults of different corporate credits. The second approach with mixture models is more realistic as it takes default dependence into account. With these models we can use an estimate of a country’s GDP to calculate an estimate for the Value-at-Risk of some credit portfolio. Bernoulli Mixture Models Probability Theory and Statistics Sannolikhetsteori och statistik
30	Separation of Points and Interval Estimation in Mixed Dose-Response Curves with Selective Component Labeling Flake, Darl D., II 01 May 2016 (has links) This dissertation develops, applies, and investigates new methods to improve the analysis of logistic regression mixture models. An interesting dose-response experiment was previously carried out on a mixed population, in which the class membership of only a subset of subjects (survivors) were subsequently labeled. In early analyses of the dataset, challenges with separation of points and asymmetric confidence intervals were encountered. This dissertation extends the previous analyses by characterizing the model in terms of a mixture of penalized (Firth) logistic regressions and developing methods for constructing profile likelihood-based confidence and inverse intervals, and confidence bands in the context of such a model. The proposed methods are applied to the motivating dataset and another related dataset, resulting in improved inference on model parameters. Additionally, a simulation experiment is carried out to further illustrate the benefits of the proposed methods and to begin to explore better designs for future studies. The penalized model is shown to be less biased than the traditional model and profile likelihood-based intervals are shown to have better coverage probability than Wald-type intervals. Some limitations, extensions, and alternatives to the proposed methods are discussed. logistic regression mixture models bias coverage probability Statistics and Probability

Search results