Global ETD Search

Return to search

A comperative study of text classification models on invoices : The feasibility of different machine learning algorithms and their accuracy

Text classification for companies is becoming more important in a world where an increasing amount of digital data are made available. The aim is to research whether five different machine learning algorithms can be used to automate the process of classification of invoice data and see which one gets the highest accuracy. Algorithms are in a later stage combined for an attempt to achieve higher results. N-grams are used, and results are compared in form of total accuracy of classification for each algorithm. A library in Python, called scikit-learn, implementing the chosen algorithms, was used. Data is collected and generated to represent data present on a real invoice where data has been extracted. Results from this thesis show that it is possible to use machine learning for this type of problem. The highest scoring algorithm (LinearSVC from scikit-learn) classifies 86% of all samples correctly. This is a margin of 16% above the acceptable level of 70%.

http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15647

information retrieval

ensemble learning

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:his-15647
Date	January 2018
Creators	Ekström, Linus, Augustsson, Andreas
Publisher	Högskolan i Skövde, Institutionen för informationsteknologi, Högskolan i Skövde, Institutionen för informationsteknologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0025 seconds

A comperative study of text classification models on invoices : The feasibility of different machine learning algorithms and their accuracy

Description

Links & Downloads

Tags

Additional Fields