Return to search
## Nonparametric Bayesian Methods for Extracting Structure from Data

One desirable property of machine learning

algorithms is the ability to balance

the number of parameters in a model

in accordance with the amount of available data.

Incorporating nonparametric Bayesian priors into models is

one approach of automatically

adjusting model capacity to the amount of available data: with small

datasets, models are less complex

(require storing fewer parameters in memory), whereas with larger datasets, models

are implicitly more complex

(require storing more parameters in memory).

Thus, nonparametric Bayesian priors satisfy frequentist intuitions

about model complexity within a fully Bayesian framework.

This thesis presents several novel

machine learning models and applications that use

nonparametric Bayesian priors.

We introduce two novel models that use flat,

Dirichlet process priors. The first is an infinite mixture

of experts model, which builds

a fully generative, joint density model of the input and output space.

The second is a Bayesian

biclustering model, which simultaneously

organizes a

data matrix into

block-constant biclusters.

The model capable of efficiently processing very large, sparse matrices,

enabling cluster analysis on incomplete data matrices.

We introduce binary matrix factorization,

a novel matrix factorization model that, in contrast to

classic factorization methods, such as singular value decomposition,

decomposes a matrix using latent binary matrices.

We describe two nonparametric Bayesian priors

over tree structures. The first is an infinitely exchangeable

generalization of the nested

Chinese restaurant process that generates

data-vectors at a single node in the tree.

The second is a novel, finitely exchangeable

prior generates trees by first partitioning data indices into groups

and then by randomly

assigning groups to a tree.

We present two applications of the tree priors: the first

automatically learns probabilistic stick-figure models of motion-capture

data that recover

plausible structure and are robust to missing

marker data.

The second learns hierarchical

allocation models based on the latent Dirichlet allocation

topic model for document corpora,

where nodes in a topic-tree

are latent ``super-topics", and nodes

in a document-tree are latent

categories.

The thesis concludes

with a summary of contributions, a discussion

of the models and their limitations, and a brief outline

of potential future research

directions.

Identifer | oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/11235 |

Date | 01 August 2008 |

Creators | Meeds, Edward |

Contributors | Roweis, Sam |

Source Sets | University of Toronto |

Language | en_ca |

Detected Language | English |

Type | Thesis |

Format | 12805176 bytes, application/pdf |

Page generated in 0.0025 seconds