Global ETD Search

Return to search

Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines : An experiment comparing the accuracy of classifications using test sets from 2005 and 2017

Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web. The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.

http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-64769

News Articles

Machine Learning

Naive Bayes

Support vector machine

SVM

Text categorisation

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-64769
Date	January 2017
Creators	Rydberg, Filip, Tornfors, Jonas
Publisher	Linnéuniversitetet, Institutionen för datavetenskap (DV), Linnéuniversitetet, Institutionen för datavetenskap (DV)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds

Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines : An experiment comparing the accuracy of classifications using test sets from 2005 and 2017

Description

Links & Downloads

Tags

Additional Fields