Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web. The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-64769 |
Date | January 2017 |
Creators | Rydberg, Filip, Tornfors, Jonas |
Publisher | Linnéuniversitetet, Institutionen för datavetenskap (DV), Linnéuniversitetet, Institutionen för datavetenskap (DV) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0024 seconds