Automatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing OCR engine with a developed python script to achieve data extraction of valuable information from a digital image of a receipt. Values such as VAT, VAT%, date, total-, gross-, and net-cost; will be considered as valuable information. This is a feature that has already been implemented in existing applications. However, the company that I have done this project for are interested in creating their own version. This project is an experiment to see if it is possible to implement such an application using restricted resources. To develop a program that can extract the information mentioned above. In this paper you will be guided though the process of the development of the program. As well as indulging in the mindset, findings and the steps taken to overcome the problems encountered along the way. The program achieved a success rate of 86.6% in extracting the most valuable information: total cost, VAT% and date from a set of 53 receipts originated from 34 separate establishments.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-88602 |
Date | January 2019 |
Creators | Johansson, Elias |
Publisher | Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.004 seconds