Return to search

Understanding Structured Documents with a Strong Layout

This work will focus on named entity recognition on documents with a strong layout using deep recurrent neural networks. Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which will be used in this research. The problem of NER on structured documents is modeled in different ways. First, the prob- lem is modeled as sequence labeling where every word or character has to labeled as belonging to one of the different entity classes. Secondly, the problem is modeled in a way that is typical for object detection in images. Here the network will output bounding boxes around words belonging to the same entity class. In order to be able to do this task successfully not only the words themselves are important but also their locations. Multiple ways of encoding these locations have been researched. Using the relative position compared to the previous word has shown to be the most effective. Exper- iments have revealed that for sequence labeling it works best to split up the documents into multiple smaller sequences of size 200 and process these with 2 bi-directional stateful LSTM layers. In this model the last hidden state of an LSTM is re-used as the initial state for the next partial sequence of a document. This model has an average F1 on all classes of 94.2%. The performance of the models that output bounding boxes are not as good as the ones for sequence labeling but they are still promising.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-200658
Date January 2017
CreatorsMarc, Romeyn
PublisherKTH, Skolan för datavetenskap och kommunikation (CSC)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0121 seconds