Return to search

TimeLink: Visualizing Diachronic Word Embeddings and Topics

The task of analyzing a collection of documents generated over time is daunting. A natural way to ease the task is by summarizing documents into the topics that exist within these documents. The temporal aspect of topics can frame relevance based on when topics are introduced and when topics stop being mentioned. It creates trends and patterns that can be traced by individual key terms taken from the corpus. If trends are being established, there must be a way to visualize them through the key terms. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique. However, creating a visual system for terms is not easy. Work has been done to develop word embeddings, allowing researchers to treat words like any number. This makes it possible to create simple charts based on word embeddings like scatter plots. However, these methods are inefficient due to loss of effectiveness with multiple time slices and point overlap. A visualization method that addresses these problems while also visualizing diachronic word embeddings in an interesting way with added semantic meaning is hard to find. These problems are managed through TimeLink. TimeLink is proposed as a dashboard system to help users gain insights from the movement of diachronic word embeddings. It comprises a Sankey diagram showing the path of a selected key term to a cluster in a time period. This local cluster is also mapped to a global topic based on an original corpus of documents from which the key terms are drawn. On the dashboard, different tools are given to users to aid in a focused analysis, such as filtering key terms and emphasizing specific clusters. TimeLink provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time. / Master of Science / The task of analyzing documents collected over time is daunting. Grouping documents into topics can help frame relevancy based on when topics are introduced and hampered. The creation of topics also enables the ability to visualize trends and patterns. Creating a visual system to support this analysis can help users quickly gain insights from the data, significantly easing the burden from the original analysis technique of browsing individual documents. A visualization system for this analysis typically focuses on the terms that affect established topics. Some visualization methods, like scatter plots, implement this but can be inefficient due to loss of effectiveness as more data is introduced. TimeLink is proposed as a dashboard system to aid users in drawing insights from the development of terms over time. In addition to addressing problems in other visualizations, it visualizes the movement of terms intuitively and adds semantic meaning. TimeLink provides insightful visualizations focused on the movement of terms while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/119395
Date11 June 2024
CreatorsWilliams, Lemara Faith
ContributorsComputer Science and#38; Applications, North, Christopher L., Danielson, Thomas Lee, Mayer, Brian Benjamin, Chen, Yan, Faust, Rebecca Jane
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.002 seconds