Global ETD Search

Return to search

Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning Approach

Classic literature is rich, be it linguistically, historically, or culturally, making it valuable for future studies. Consequently, this project chose a set of 48 classic books to conduct a stylometric analysis on the defined set of books, adopting an approach used by a related work to divide the books into text segments, quantify the resulting text segments, and analyze the books using the quantified values to understand the linguistic attributes of the books. Apart from the latter, this project conducted different classification tasks for other objectives. In one respect, the study used the quantified values of the text segments of the books for classification tasks using advanced models like LightGBM and TabNet to assess the application of this approach in authorship attribution. From another perspective, the study utilized a State-Of-The-Art model, namely, RoBERTa for classification tasks using the segmented texts of the books instead to evaluate the performance of the model in authorship attribution. The results uncovered the characteristics of the books to a reasonable degree. Regarding the authorship attribution tasks, the results suggest that segmenting and quantifying text using stylometric analysis and supervised machine learning algorithms is practical in such tasks. This approach, while showing promise, may still require further improvements to achieve optimal performance. Lastly, RoBERTa demonstrated high performance in authorship attribution tasks.

http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-106112

Authorship Attribution

Classic Literature Analysis

Multiclass Classification

Datavetenskap (datalogi)

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ltu-106112
Date	January 2024
Creators	Yousif, Jacob, Scarano, Donato
Publisher	Luleå tekniska universitet, Institutionen för system- och rymdteknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0031 seconds

Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning Approach

Description

Links & Downloads

Tags

Additional Fields