This paper explores topic modeling via unsupervised non-negative matrix factorization. This technique is used on a variety of sources in order to extract salient topics. From these topics, hidden entity networks are discovered and visualized in a graph representation. In addition, other visualization techniques such as examining the time series of a topic and examining the top words of a topic are used for evaluation and analysis. There is a large software component to this project, and so this paper will also focus on the design decisions that were made in order to make the program developed as versatile and extensible as possible.
Identifer | oai:union.ndltd.org:CLAREMONT/oai:scholarship.claremont.edu:cmc_theses-2667 |
Date | 01 January 2017 |
Creators | Cooper, Wyatt |
Publisher | Scholarship @ Claremont |
Source Sets | Claremont Colleges |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | CMC Senior Theses |
Rights | © 2017 Wyatt J Cooper, default |
Page generated in 0.0017 seconds