Global ETD Search

Return to search

EXPLORING PSEUDO-TOPIC-MODELING FOR CREATING AUTOMATED DISTANT-ANNOTATION SYSTEMS

We explore the use a Latent Dirichlet Allocation (LDA) imitating pseudo-topic-model, based on our original relevance metric, as a tool to facilitate distant annotation of short (often one to two sentence or less) documents. Our exploration manifests as annotating tweets for emotions, this being the current use-case of interest to us, but we believe the method could be extended to any multi-class labeling task of documents of similar length. Tweets are gathered via the Twitter API using "track" terms thought likely to capture tweets with a greater chance of exhibiting each emotional class, 3,000 tweets for each of 26 topics anticipated to elicit emotional discourse. Our pseudo-topic-model is used to produce relevance-ranked vocabularies for each corpus of tweets and these are used to distribute emotional annotations to those tweets not manually annotated, magnifying the number of annotated tweets by a factor of 29. The vector labels the annotators produce for the topics are cascaded out to the tweets via three different schemes which are compared for performance by proxy through the competition of bidirectional-LSMTs trained using the tweets labeled at a distance. An SVM and two emotionally annotated vocabularies are also tested on each task to provide context and comparison.

distant annotation

emotion detection

Natural language processing

sentiment analysis

topic modeling

Identifer	oai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3876
Date	01 September 2021
Creators	Sommers, Alexander Mitchell
Publisher	OpenSIUC
Source Sets	Southern Illinois University Carbondale
Detected Language	English
Type	text
Format	application/pdf
Source	Theses

Page generated in 0.0029 seconds

EXPLORING PSEUDO-TOPIC-MODELING FOR CREATING AUTOMATED DISTANT-ANNOTATION SYSTEMS

Description

Links & Downloads

Tags

Additional Fields