Return to search

Convolutional Neural Networks for Named Entity Recognition in Images of Documents

This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-191213
Date January 2016
Creatorsvan de Kerkhof, Jan
PublisherKTH, Skolan för datavetenskap och kommunikation (CSC), -
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0932 seconds