Return to search

Anchor-based Topic Modeling with Human Interpretable Results / Tolkningsbara ämnesmodeller baserade på ankarord

Topic models are useful tools for exploring large data sets of textual content by exposing a generative process from which the text was produced. Anchor-based topic models utilize the anchor word assumption to define a set of algorithms with provable guarantees which recover the underlying topics with a run time practically independent of corpus size. A number of extensions to the initial anchor word-based algorithms, and enhancements made to tangential models, have been proposed which improve the intrinsic characteristics of the model making them more interpretable by humans. This thesis evaluates improvements to human interpretability due to: low-dimensional word embeddings in combination with a regularized objective function, automatic topic merging using tandem anchors, and utilizing word embeddings to synthetically increase corpus density. Results show that tandem anchors are viable vehicles for automatic topic merging, and that using word embeddings significantly improves the original anchor method across all measured metrics. Combining low-dimensional embeddings and a regularized objective results in computational downsides with small or no improvements to the metrics measured.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-168134
Date January 2020
CreatorsAndersson, Henrik
PublisherLinköpings universitet, Interaktiva och kognitiva system
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds