Return to search

Pachinko allocation: DAG-structured mixture models of topic correlations

Statistical topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, the majority of existing approaches capture no or limited correlations between topics. We propose the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). We present various structures within this framework, different parameterizations of topic distributions, and an extension to capture dynamic patterns of topic correlations. We also introduce a non-parametric Bayesian prior to automatically learn the topic structure from data. The model is evaluated on document classification, likelihood of held-out data, the ability to support fine-grained topics, and topical keyword coherence. With a highly-scalable approximation, PAM has also been applied to discover topic hierarchies in very large datasets.

Identiferoai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-4826
Date01 January 2007
CreatorsLi, Wei
PublisherScholarWorks@UMass Amherst
Source SetsUniversity of Massachusetts, Amherst
LanguageEnglish
Detected LanguageEnglish
Typetext
SourceDoctoral Dissertations Available from Proquest

Page generated in 0.0016 seconds