Return to search

The crucial parts of text classification with TensorFlow.js and categorisation of news articles

Text classification is a subset of machine learning which is used to classify texts such as tweets, email, news headlines or articles, with tags or categories. As news publishing can have uncertainty in their categorisations, text classification could categorise articles autonomously and distinguish unclear categorisations. The library TensorFlow helps with operations and tools for the machine learning workflow.  This paper takes focus on the crucial parts of working with machine learning using TensorFlow.js and to what extent this model can categorise a news article. The authors evaluates different models to analyse how optimising the settings will affect the accuracy of the model. Results of this paper was researched with a literature study of official documentations and peer reviewed reports. An empirical experiment where machine learning models were trained in TensorFlow.js was also performed. The results showed that the model with the highest accuracy with 87.17% accuracy was trained with 1000 articles using Relu and Softmax activation functions and the Mean squared error loss function. While the model with lowest accuracy had 75.5% using Sigmoid activation functions and Categorical cross-entropy on the 5000 articles training set. Crucial parts for this development were: optimizer function, loss function, batch size, activation functions, training data and test data with labels, normalise function, shapes of layers and computing power. There are several parts and functions to take in consideration when developing a machine learning model with text classification in TensorFlow.js. The training process needs to be performed multiple times as there are many parameters which has an affect on the model results. The model results can be improved by optimising and finding the best combination between different functions and parameters.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:bth-19813
Date January 2020
CreatorsNordberg, Gustav, Grandien, Jesper
PublisherBlekinge Tekniska Högskola, Institutionen för programvaruteknik, Blekinge Tekniska Högskola, Institutionen för programvaruteknik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.003 seconds