This thesis aims to explore the possibilities and components of employing automated text classification techniques to classify collections of narrative fiction by genre, and also, what linguistic features are prominent in distinguishing genres of fiction. The historical traditions and current practices and theories in the field of fiction classification are outlined, along with central concepts of classification and genre theory. Linguistic features are also introduced, and hypothesized to carry capabilities of distinguishing genres of fiction. The thesis also reviews the foundations and current state of automated text classification, and reasons on what constitutes topical and stylistic features in relation to fiction. Knowledge gaps are identified between automated text classification and traditional fiction classification, and also, concerning the potentially genre distinguishing qualities of topical and stylistic features. The main experiment, around which the thesis is centered, is divided into two parts. The first part employs and evaluates kNN and SVM classifiers on a collection of fiction documents across four genres of fiction. In the second part, some feature selection methods are employed for inspection of distinguishing features across the collection. Findings suggest a potential of using automated techniques to classify fiction, and also illustrates feature patterns that are argued to distinguish each of the four different genres of fiction. Some suggestions for further research are also proposed.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hb-22862 |
Date | January 2019 |
Creators | Falk, Olof |
Publisher | Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.004 seconds