<p dir="ltr">Short text has historically proven challenging to work with in many Natural Language<br>Processing (NLP) applications. Traditional tasks such as authorship attribution benefit<br>from having longer samples of work to derive features from. Even newer tasks, such as<br>synthetic text detection, struggle to distinguish between authentic and synthetic text in<br>the short-form. Due to the widespread usage of social media and the proliferation of freely<br>available Large Language Models (LLMs), such as the GPT series from OpenAI and Bard<br>from Google, there has been a deluge of short-form text on the internet in recent years.<br>Short-form text has either become or remained a staple in several ubiquitous areas such as<br>schoolwork, entertainment, social media, and academia. This thesis seeks to analyze this<br>short text through the lens of NLP tasks such as synthetic text detection, LLM authorship<br>attribution, derived engagement, and predicted engagement. The first focus explores the task<br>of detection in the binary case of determining whether tweets are synthetically generated or<br>not and proposes a novel feature extraction technique to improve classifier results. The<br>second focus further explores the challenges presented by short-form text in determining<br>authorship, a cavalcade of related difficulties, and presents a potential work around to those<br>issues. The final focus attempts to predict social media engagement based on the NLP<br>representations of comments, and results in some new understanding of the social media<br>environment and the multitude of additional factors required for engagement prediction.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/26335663 |
Date | 22 July 2024 |
Creators | Ryan J Schwarz (19178926) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/AN_ANALYSIS_ON_SHORT-FORM_TEXT_AND_DERIVED_ENGAGEMENT/26335663 |
Page generated in 0.0019 seconds