Spelling suggestions: "subject:"batural language procession (NLP)"" "subject:"batural language processions (NLP)""
1 |
Capturing Style Through Large Language Models - An Authorship PerspectiveAnuj Dubey (18398505) 10 December 2024 (has links)
<p dir="ltr">This research investigates the use of Large Language Model (LLM) embeddings to capture the unique stylistic features of authors in Authorship Attribution (AA) tasks. Specifically, the focus of this research is on evaluating whether LLM-generated embeddings can effectively capture stylistic nuances that distinguish different authors, ultimately assessing their utility in tasks such as authorship attribution and clustering.The dataset comprises news articles from The Guardian authored by multiple writers, and embeddings were generated using OpenAI's text-embedding-ada-002 model. These embeddings were subsequently passed through a Siamese network with the objective of determining whether pairs of texts were authored by the same individual. The resulting model was used to generate style embeddings for unseen articles, which were then evaluated through classification and cluster analysis to assess their effectiveness in identifying individual authors across varying text samples. The classification task tested the model's accuracy in distinguishing authors, while the clustering analysis examined whether style embeddings primarily captured authorial identity or reflected domain-specific topics.</p><p dir="ltr">Our findings demonstrate that the proposed architecture achieves high accuracy for authors not previously encountered, outperforming traditional stylometric features and highlighting the effectiveness of LLM-based style embeddings. Additionally, our experiments reveal that authorship attribution accuracy decreases as the number of authors increases, yet improves with longer text lengths. </p><p dir="ltr"><br></p>
|
Page generated in 0.1238 seconds