Return to search

Bayesian nonparametric clustering based on Dirichlet processes

Following a review of some traditional methods of clustering, we review the Bayesian nonparametric framework for modelling object attribute differences. We focus on Dirichlet Process (DP) mixture models, in which the observed clusters in any particular data set are not viewed as belonging to a fixed set of clusters but rather as representatives of a latent structure in which clusters belong to one of a potentially infinite number of clusters. As more information about attribute differences is revealed, the number of inferred clusters is allowed to grow. We begin by studying DP mixture models for normal data and show how to adapt one of the most widely used conditional methods for computation to improve sampling efficiency. This scheme is then generalized, followed by an application to discrete data. The DP’s dispersion parameter is a critical parameter controlling the number of clusters. We propose a framework for the specification of the hyperparameters for this parameter, using a percentile based method. This research was motivated by the analysis of product trials at the magazine Which?, where brand attributes are usually assessed on a 5-point preference scale by experts or by a random selection of Which? subscribers. We conclude with a simulation study, where we replicate some of the standard trials at Which? and compare the performance of our DP mixture models against various other popular frequentist and Bayesian multiple comparison routines adapted for clustering.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:565025
Date January 2010
CreatorsMurugiah, S.
PublisherUniversity College London (University of London)
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://discovery.ucl.ac.uk/20467/

Page generated in 0.016 seconds