Return to search

Layout Detection and Table Recognition: Recent Challenges in Digitizing Historical Documents and Handwritten Tabular Data

In this paper, we discuss the computer-aided processing of handwritten tabular
records of historical weather data. The observationes meteorologicae, which are housed by the
Regensburg University Library, are one of the oldest collections of weather data in Europe.
Starting in 1771, meteorological data was consistently documented in a standardized form
over almost 60 years by several writers. The tabular structure, as well as the unconstrained
textual layout of comments and the use of historical characters, propose various challenges
in layout and text recognition. We present a customized strategy to digitize tabular and
handwritten data by combining various state-of-the-art methods for OCR processing to fit
the collection. Since the recognition of historical documents still poses major challenges,
we provide lessons learned from experimental testing during the first project stages. Our
results show that deep learning methods can be used for text recognition and layout detection.
However, they are less efficient for the recognition of tabular structures. Furthermore,
a tailored approach had to be developed for the historical meteorological characters during
the manual creation of ground truth data. The customized system achieved an accuracy
rate of 82% for the text recognition of the heterogeneous handwriting and 87% accuracy
for layout recognition of the tables.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:92003
Date11 June 2024
CreatorsLehenmeier, Constantin, Burghardt, Manuel, Mischka, Bernadette
PublisherSpringer
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/acceptedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess
Relationhttps://doi.org/10.1007/978-3-030-54956-5_17

Page generated in 0.0088 seconds