Return to search

Nonparametric Bayesian Methods for Extracting Structure from Data

One desirable property of machine learning
algorithms is the ability to balance
the number of parameters in a model
in accordance with the amount of available data.
Incorporating nonparametric Bayesian priors into models is
one approach of automatically
adjusting model capacity to the amount of available data: with small
datasets, models are less complex
(require storing fewer parameters in memory), whereas with larger datasets, models
are implicitly more complex
(require storing more parameters in memory).
Thus, nonparametric Bayesian priors satisfy frequentist intuitions
about model complexity within a fully Bayesian framework.

This thesis presents several novel
machine learning models and applications that use
nonparametric Bayesian priors.
We introduce two novel models that use flat,
Dirichlet process priors. The first is an infinite mixture
of experts model, which builds
a fully generative, joint density model of the input and output space.
The second is a Bayesian
biclustering model, which simultaneously
organizes a
data matrix into
block-constant biclusters.
The model capable of efficiently processing very large, sparse matrices,
enabling cluster analysis on incomplete data matrices.

We introduce binary matrix factorization,
a novel matrix factorization model that, in contrast to
classic factorization methods, such as singular value decomposition,
decomposes a matrix using latent binary matrices.

We describe two nonparametric Bayesian priors
over tree structures. The first is an infinitely exchangeable
generalization of the nested
Chinese restaurant process that generates
data-vectors at a single node in the tree.
The second is a novel, finitely exchangeable
prior generates trees by first partitioning data indices into groups
and then by randomly
assigning groups to a tree.

We present two applications of the tree priors: the first
automatically learns probabilistic stick-figure models of motion-capture
data that recover
plausible structure and are robust to missing
marker data.
The second learns hierarchical
allocation models based on the latent Dirichlet allocation
topic model for document corpora,
where nodes in a topic-tree
are latent ``super-topics", and nodes
in a document-tree are latent

The thesis concludes
with a summary of contributions, a discussion
of the models and their limitations, and a brief outline
of potential future research
Date01 August 2008
CreatorsMeeds, Edward
ContributorsRoweis, Sam
Source SetsUniversity of Toronto
Detected LanguageEnglish
Format12805176 bytes, application/pdf

Page generated in 0.0025 seconds