Global ETD Search

Return to search

DECO: A Dataset of Annotated Spreadsheets for Layout and Table Recognition

This paper presents DECO (Dresden Enron COrpus), a dataset of spreadsheet files, annotated on the basis of layout and contents. It comprises of 1,165 files, extracted from the Enron corpus. Three different annotators (judges) assigned layout roles (e.g., Header, Data, and Notes) to non-empty cells and marked the borders of tables. Files that do not contain tables were flagged using categories such as Template, Form, and Report. Subsequently, a thorough analysis is performed to uncover the characteristics of the overall dataset and specific annotations. The results are discussed in this paper, providing several takeaways for future works. Furthermore, this work describes in detail the annotation methodology, going through the individual steps. The dataset, methodology, and tools are made publicly available, so that they can be adopted for further studies. DECO is available at: https://wwwdb.inf.tu-dresden.de/research-projects/deexcelarator/.

info:eu-repo/classification/ddc/004

ddc:004

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:82977
Date	22 June 2023
Creators	Lehner, Wolfgang, Koci, Elvis, Thiele, Maik, Rehak, Josephine, Romero, Oscar
Publisher	IEEE
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	info:eu-repo/semantics/acceptedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text
Rights	info:eu-repo/semantics/openAccess
Relation	978-1-7281-3014-9, 10.1109/ICDAR.2019.00207

Page generated in 0.0015 seconds

DECO: A Dataset of Annotated Spreadsheets for Layout and Table Recognition

Description

Links & Downloads

Tags

Additional Fields