Automatic literary genre classification presents a challenging task for Natural Language Processing (NLP) systems, mainly because literary texts have deeper levels of meanings, hold distinctive themes, and communicate certain messages and emotions. We conduct a study where we experiment with building literary genre classifiers based on emotions in novels, to investigate the effects that features pertinent to emotions have on models of genre prediction. We begin by performing an analysis of emotions describing emotional composition and density in the dataset. The experiments are carried out on a dataset consisting of novels categorized in eight different genres. Genre prediction models are built using three algorithms: Random Forest, Support Vector Machine, and k-Nearest Neighbor. We build models based on emotion-words counts and emotional words in a novel, and compare them to models of commonly used features, the bag-of-words and the TF-IDF features. Moreover, we use a feature selection dimensionality reduction procedure on the TF-IDF feature set and study its impact on classification performance. Finally, we train and test the classifiers on a combination of the two most optimal emotion-related feature sets, and compare them on classifiers trained and tested on a combination of bag-of-words and the reduced TF-IDF features. Our results confirm that: using features of emotional content in novels improves classification performance a 75% F1 compared to a bag-of-words baseline of 71% F1; TF-IDF feature filtering method positively impacts genre classification performance on literary texts.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-447148 |
Date | January 2021 |
Creators | Yako, Mary |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0021 seconds