Return to search

Hierarchical text classification of fiction books : With Thema subject categories

Categorizing books and literature of any genre and subject area is a vital task for publishers which seek to distribute their books to the appropriate audiences. It is common that different countries use different subject categorization schemes, which makes international book trading more difficult due to the need to categorize books from scratch once they reach another country. A solution to this problem has been proposed in the form of an international standard called Thema, which encompasses thousands of hierarchical subject categories. However, because this scheme is quite recent, many books published before its creation are yet to be assigned subject categories. It also is often the case that even recent books are not categorized. In this work, methods for automatic categorization of books are investigated, based on multinomial Naive Bayes and Facebook's classifier fastText. The results show some amount of promise for both classifiers, but overall, due to data imbalance and a very long training time that made it difficult to use more data, it is not possible to determine with certainty which classifier actually is best.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-154469
Date January 2019
CreatorsReinaudo, Alice
PublisherLinköpings universitet, Interaktiva och kognitiva system
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0027 seconds