• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Computer-Aided Optically Scanned Document Information Extraction System

Mei, Zhijie January 2020 (has links)
This paper introduced a Computer-Aided Optically Scanned Document Information Extraction System. It could extract information including invoice No., issued date, buyer, etc., from the optically scanned document to meet the demand of customs declaration companies. The system output the structured information to a relational database. In detail, a software architecture for the information extraction of diverse-structure optically scanned document is designed. In this system, the original document is classified firstly. It would put into template-based extraction to improve the extraction performance if its template is pre-defined in the system. Then, a method for image enhancement to improve the image classification is proposed. This method aims to optimize the accuracy of neural network model by extracting the template-related feature and actively removing the unrelated feature. Lastly, the above system is implemented in this paper. This extraction are programed in Python which is a cross-platform languages. This system comprises three parts, classification module, template-based extraction and non-template extraction all of which have APIs and could be ran independently. This feature make this system flexible and easy to customization for the further demand. 445 real-world customs document images were input to evaluate the system. The result revealed that the introduced system ensured the diverse document support with non-template extraction and reached the overall high performance with template-based extraction showing the goal was basically achieved.

Page generated in 0.457 seconds