Return to search

Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines : An experiment comparing the accuracy of classifications using test sets from 2005 and 2017

Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web.  The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-64769
Date January 2017
CreatorsRydberg, Filip, Tornfors, Jonas
PublisherLinnéuniversitetet, Institutionen för datavetenskap (DV), Linnéuniversitetet, Institutionen för datavetenskap (DV)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds