Return to search

EXPLORING PSEUDO-TOPIC-MODELING FOR CREATING AUTOMATED DISTANT-ANNOTATION SYSTEMS

We explore the use a Latent Dirichlet Allocation (LDA) imitating pseudo-topic-model, based on our original relevance metric, as a tool to facilitate distant annotation of short (often one to two sentence or less) documents. Our exploration manifests as annotating tweets for emotions, this being the current use-case of interest to us, but we believe the method could be extended to any multi-class labeling task of documents of similar length. Tweets are gathered via the Twitter API using "track" terms thought likely to capture tweets with a greater chance of exhibiting each emotional class, 3,000 tweets for each of 26 topics anticipated to elicit emotional discourse. Our pseudo-topic-model is used to produce relevance-ranked vocabularies for each corpus of tweets and these are used to distribute emotional annotations to those tweets not manually annotated, magnifying the number of annotated tweets by a factor of 29. The vector labels the annotators produce for the topics are cascaded out to the tweets via three different schemes which are compared for performance by proxy through the competition of bidirectional-LSMTs trained using the tweets labeled at a distance. An SVM and two emotionally annotated vocabularies are also tested on each task to provide context and comparison.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3876
Date01 September 2021
CreatorsSommers, Alexander Mitchell
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses

Page generated in 0.0029 seconds