• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 123
  • 20
  • 18
  • 16
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 211
  • 211
  • 76
  • 48
  • 42
  • 40
  • 39
  • 38
  • 35
  • 30
  • 28
  • 26
  • 24
  • 23
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Local Mixture Model in Hilbert Space

Zhiyue, Huang 26 January 2010 (has links)
In this thesis, we study local mixture models with a Hilbert space structure. First, we consider the fibre bundle structure of local mixture models in a Hilbert space. Next, the spectral decomposition is introduced in order to construct local mixture models. We analyze the approximation error asymptotically in the Hilbert space. After that, we will discuss the convexity structure of local mixture models. There are two forms of convexity conditions to consider, first due to positivity in the $-1$-affine structure and the second by points having to lie inside the convex hull of a parametric family. It is shown that the set of mixture densities is located inside the intersection of the sets defined by these two convexities. Finally, we discuss the impact of the approximation error in the Hilbert space when the domain of mixing variable changes.
22

Probabilistic Models for Genetic and Genomic Data with Missing Information

Hicks, Stephanie 16 September 2013 (has links)
Genetic and genomic data often contain unobservable or missing information. Applications of probabilistic models such as mixture models and hidden Markov models (HMMs) have been widely used since the 1960s to make inference on unobserved information using some observed information demonstrating the versatility and importance of these models. Biological applications of mixture models include gene expression data, meta-analysis, disease mapping, epidemiology and pharmacology and applications of HMMs include gene finding, linkage analysis, phylogenetic analysis and identifying regions of identity-by-descent. An important statistical and informatics challenge posed by modern genetics is to understand the functional consequences of genetic variation and its relation to phenotypic variation. In the analysis of whole-exome sequencing data, predicting the impact of missense mutations on protein function is an important factor in identifying and determining the clinical importance of disease susceptibility mutations in the absence of independent data determining impact on disease. In addition to the interpretation, identifying co-inherited regions of related individuals with Mendelian disorders can further narrow the search for disease susceptibility mutations. In this thesis, we develop two probabilistic models in application of genetic and genomic data with missing information: 1) a mixture model to estimate a posterior probability of functionality of missense mutations and 2) a HMM to identify co-inherited regions in the exomes of related individuals. The first application combines functional predictions from available computational or {\it in silico} methods which often have a high degree of disagreement leading to conflicting results for the user to assess the pathogenic impact of missense mutations on protein function. The second application considers extensions of a first-order HMM to include conditional emission probabilities varying as a function of minor allele frequency and a second-order dependence structure between observed variant calls. We apply these models to whole-exome sequencing data and show how these models can be used to identify disease susceptibility mutations. As disease-gene identification projects increasingly use next-generation sequencing, the probabilistic models developed in this thesis help identify and associate relevant disease-causing mutations with human disorders. The purpose of this thesis is to demonstrate that probabilistic models can contribute to more accurate and dependable inference based on genetic and genomic data with missing information.
23

Local Mixture Model in Hilbert Space

Zhiyue, Huang 26 January 2010 (has links)
In this thesis, we study local mixture models with a Hilbert space structure. First, we consider the fibre bundle structure of local mixture models in a Hilbert space. Next, the spectral decomposition is introduced in order to construct local mixture models. We analyze the approximation error asymptotically in the Hilbert space. After that, we will discuss the convexity structure of local mixture models. There are two forms of convexity conditions to consider, first due to positivity in the $-1$-affine structure and the second by points having to lie inside the convex hull of a parametric family. It is shown that the set of mixture densities is located inside the intersection of the sets defined by these two convexities. Finally, we discuss the impact of the approximation error in the Hilbert space when the domain of mixing variable changes.
24

Structured Bayesian learning through mixture models

PETRALIA, FRANCESCA January 2013 (has links)
<p>In this thesis, we develop some Bayesian mixture density estimation for univariate and multivariate data. We start proposing a repulsive process favoring mixture components further apart. While conducting inferences on the cluster-specific parameters, current frequentist and Bayesian methods often encounter problems when clusters are placed too close together to be scientifically meaningful. Current Bayesian practice generates component-specific parameters independently from a common prior, which tends to favor similar components and often leads to substantial probability assigned to redundant components that are not needed to fit the data. As an alternative, we propose to generate components from a repulsive process, which leads to fewer, better separated and more interpretable clusters. </p><p>In the second part of the thesis, we face the problem of modeling the conditional distribution of a response variable given a high dimensional vector of predictors potentially concentrated near a lower dimensional subspace or manifold. In many settings it is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiresolution model that scales efficiently to massive numbers of features, and can be implemented efficiently with slice sampling.</p><p> In the third part of the thesis, we deal with the problem of characterizing the conditional density of a multivariate vector of response given a potentially high dimensional vector of predictors. The proposed model flexibly characterizes the density of the response variable by hierarchically coupling a collection of factor models, each one defined on a different scale of resolution. As it is illustrated in Chapter 4, our proposed method achieves good predictive performance compared to competitive models while efficiently scaling to high dimensional predictors.</p> / Dissertation
25

Model-based clustering of high-dimensional binary data

Tang, Yang 05 September 2013 (has links)
We present a mixture of latent trait models with common slope parameters (MCLT) for high dimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a d-dimensional Gaussian latent variable, is extended by implementing common factor analyzers. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through block-specific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. The Bayesian information criterion is used to select the number of components and the covariance structure as well as the dimensions of latent variables. Our approach is demonstrated on U.S. Congressional voting data and on a data set describing the sensory properties of orange juice. Our examples show that our model performs well even when the number of observations is not very large relative to the data dimensionality. In both cases, our approach yields intuitive clustering results. Additionally, our dimensionality-reduction method allows data to be displayed in low-dimensional plots. / Early Researcher Award from the Government of Ontario (McNicholas); NSERC Discovery Grants (Browne and McNicholas).
26

Computational Methods for Comparative Analysis of Rare Cell Subsets in Flow Cytometry

Frelinger, Jacob Jeffrey January 2013 (has links)
<p>Automated analysis techniques for flow cytometry data can address many of the limitations of manual analysis by providing an objective approach for the identification of cellular subsets. While automated analysis has the potential to significantly improve automated analysis, challenges remain for automated methods in cross sample analysis for large scale studies. This thesis presents new methods for data normalization, sample enrichment for rare events of interest, and cell subset relabeling. These methods build upon and extend the use of Gaussian mixture models in automated flow cytometry analysis to enable practical large scale cell subset identification.</p> / Dissertation
27

A model-based frequency constraint for mining associations from transaction data

Hahsler, Michael January 2004 (has links) (PDF)
In this paper we develop an alternative to minimum support which utilizes knowledge of the process which generates transaction data and allows for highly skewed frequency distributions. We apply a simple stochastic model (the NB model), which is known for its usefulness to describe item occurrences in transaction data, to develop a frequency constraint. This model-based frequency constraint is used together with a precision threshold to find individual support thresholds for groups of associations. We develop the notion of NB-frequent itemsets and present two mining algorithms which find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint can provide significant improvements over a single minimum support threshold and that the precision threshold is easier to use. (author's abstract) / Series: Working Papers on Information Systems, Information Business and Operations
28

Assessment of a Credit Value atRisk for Corporate Credits

Kremer, Laura January 2013 (has links)
In this thesis I describe the essential steps of developing a credit rating system. This comprises the credit scoring process that assigns a credit score to each credit, the forming of rating classes by the k-means algorithm and the assignment of a probability of default (PD) for the rating classes. The main focus is on the PD estimation for which two approaches are presented. The first and simple approach in form of a calibration curve assumes independence of the defaults of different corporate credits. The second approach with mixture models is more realistic as it takes default dependence into account. With these models we can use an estimate of a country’s GDP to calculate an estimate for the Value-at-Risk of some credit portfolio.
29

Separation of Points and Interval Estimation in Mixed Dose-Response Curves with Selective Component Labeling

Flake, Darl D., II 01 May 2016 (has links)
This dissertation develops, applies, and investigates new methods to improve the analysis of logistic regression mixture models. An interesting dose-response experiment was previously carried out on a mixed population, in which the class membership of only a subset of subjects (survivors) were subsequently labeled. In early analyses of the dataset, challenges with separation of points and asymmetric confidence intervals were encountered. This dissertation extends the previous analyses by characterizing the model in terms of a mixture of penalized (Firth) logistic regressions and developing methods for constructing profile likelihood-based confidence and inverse intervals, and confidence bands in the context of such a model. The proposed methods are applied to the motivating dataset and another related dataset, resulting in improved inference on model parameters. Additionally, a simulation experiment is carried out to further illustrate the benefits of the proposed methods and to begin to explore better designs for future studies. The penalized model is shown to be less biased than the traditional model and profile likelihood-based intervals are shown to have better coverage probability than Wald-type intervals. Some limitations, extensions, and alternatives to the proposed methods are discussed.
30

Analysis of Four and Five-Way Data and Other Topics in Clustering

Tait, Peter A. January 2021 (has links)
Clustering is the process of finding underlying group structure in data. As the scale of data collection continues to grow, this “big data” phenomenon results in more complex data structures. These data structures are not always compatible with traditional clustering methods, making their use problematic. This thesis presents methodology for analyzing samples of four-way and higher data, examples of these more complex data types. These data structures consist of samples of continuous data arranged in multidimensional arrays. A large emphasis is placed on clustering this data using mixture models that leverage tensor-variate distributions to model the data. Parameter estimation for all these methods are based on the expectation-maximization algorithm. Both simulated and real data are used for illustration. / Thesis / Doctor of Science (PhD)

Page generated in 0.0666 seconds