Global ETD Search

Return to search

Genre classification using syntactic features

This thesis work adresses text classification in relation to genre identification using different feature sets, with a focus on syntactic based features. We built our models by means of traditional machine learning algorithms, i.e. Naive Bayes, K-nearest neighbour, Support Vector Machine and Random Forest in order to predict the literary genre of books. We trained our models using as feature sets bag-of-words (BOW), bigrams, syntactic-based bigrams and emotional features, as well as combinations of features. Results obtained using the best features, i.e. BOW combined with bigrams based on syntactic relations between words, on the test set showed an enhancement in performance by 2% in F1-score over the baseline using BOW features, which translates into a positive impact of using syntactic information in the task of text classification.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-454667

genre classification

text classification

machine learning

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-454667
Date	January 2021
Creators	Brigadoi, Ivan
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0023 seconds

Genre classification using syntactic features

Description

Links & Downloads

Tags

Additional Fields