Global ETD Search

Return to search

Topic Regression

Text documents are generally accompanied by non-textual information, such as authors, dates, publication sources, and, increasingly, automatically recognized named entities. Work in text analysis has often involved predicting these non-text values based on text data for tasks such as document classification and author identification. This thesis considers the opposite problem: predicting the textual content of documents based on non-text data. In this work I study several regression-based methods for estimating the influence of specific metadata elements in determining the content of text documents. Such topic regression methods allow users of document collections to test hypotheses about the underlying environments that produced those documents.

Machine Learning

Topic Modeling

Computer Sciences

Identifer	oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:open_access_dissertations-1520
Date	01 February 2012
Creators	Mimno, David
Publisher	ScholarWorks@UMass Amherst
Source Sets	University of Massachusetts, Amherst
Detected Language	English
Type	text
Format	application/pdf
Source	Open Access Dissertations

Page generated in 0.0759 seconds

Topic Regression

Description

Links & Downloads

Tags

Additional Fields