Return to search

Nonparametric Bayesian analysis of some clustering problems

Nonparametric Bayesian models have been researched extensively in the past 10 years
following the work of Escobar and West (1995) on sampling schemes for Dirichlet processes.
The infinite mixture representation of the Dirichlet process makes it useful
for clustering problems where the number of clusters is unknown. We develop nonparametric
Bayesian models for two different clustering problems, namely functional
and graphical clustering.
We propose a nonparametric Bayes wavelet model for clustering of functional or
longitudinal data. The wavelet modelling is aimed at the resolution of global and
local features during clustering. The model also allows the elicitation of prior belief
about the regularity of the functions and has the ability to adapt to a wide range
of functional regularity. Posterior inference is carried out by Gibbs sampling with
conjugate priors for fast computation. We use simulated as well as real datasets to
illustrate the suitability of the approach over other alternatives.
The functional clustering model is extended to analyze splice microarray data.
New microarray technologies probe consecutive segments along genes to observe alternative
splicing (AS) mechanisms that produce multiple proteins from a single gene.
Clues regarding the number of splice forms can be obtained by clustering the functional
expression profiles from different tissues. The analysis was carried out on the Rosetta dataset (Johnson et al., 2003) to obtain a splice variant by tissue distribution
for all the 10,000 genes. We were able to identify a number of splice forms that appear
to be unique to cancer.
We propose a Bayesian model for partitioning graphs depicting dependencies
in a collection of objects. After suitable transformations and modelling techniques,
the problem of graph cutting can be approached by nonparametric Bayes clustering.
We draw motivation from a recent work (Dhillon, 2001) showing the equivalence of
kernel k-means clustering and certain graph cutting algorithms. It is shown that
loss functions similar to the kernel k-means naturally arise in this model, and the
minimization of associated posterior risk comprises an effective graph cutting strategy.
We present here results from the analysis of two microarray datasets, namely the
melanoma dataset (Bittner et al., 2000) and the sarcoma dataset (Nykter et al.,
2006).

Identiferoai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/4251
Date30 October 2006
CreatorsRay, Shubhankar
ContributorsCarroll, Raymond J., Mallick, Bani K.
PublisherTexas A&M University
Source SetsTexas A and M University
Languageen_US
Detected LanguageEnglish
TypeBook, Thesis, Electronic Dissertation, text
Format857243 bytes, electronic, application/pdf, born digital

Page generated in 0.0019 seconds