Global ETD Search

Return to search

Anchor-based Topic Modeling with Human Interpretable Results / Tolkningsbara ämnesmodeller baserade på ankarord

Topic models are useful tools for exploring large data sets of textual content by exposing a generative process from which the text was produced. Anchor-based topic models utilize the anchor word assumption to define a set of algorithms with provable guarantees which recover the underlying topics with a run time practically independent of corpus size. A number of extensions to the initial anchor word-based algorithms, and enhancements made to tangential models, have been proposed which improve the intrinsic characteristics of the model making them more interpretable by humans. This thesis evaluates improvements to human interpretability due to: low-dimensional word embeddings in combination with a regularized objective function, automatic topic merging using tandem anchors, and utilizing word embeddings to synthetically increase corpus density. Results show that tandem anchors are viable vehicles for automatic topic merging, and that using word embeddings significantly improves the original anchor method across all measured metrics. Combining low-dimensional embeddings and a regularized objective results in computational downsides with small or no improvements to the metrics measured.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-168134

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-168134
Date	January 2020
Creators	Andersson, Henrik
Publisher	Linköpings universitet, Interaktiva och kognitiva system
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds

Anchor-based Topic Modeling with Human Interpretable Results / Tolkningsbara ämnesmodeller baserade på ankarord

Description

Links & Downloads

Tags

Additional Fields