Return to search

Analýza rozložení textu v historických dokumentech / Text Layout Analysis in Historical Documents

The goal of this thesis is to design and implement algorithm for text layout analysis in historical documents. Neural network was used to solve this problem, specifically architecture Faster-RCNN. Dataset of 6 135 images with historical newspaper was used for training and testing. For purpose of the thesis four models of neural networks were trained: model for detection of words, headings, text regions and model for words detection based on position in line. Outputs from these models were processed in order to determine text layout in input image. A modified F-score metric was used for the evaluation. Based on this metric, the algorithm reached an accuracy almost 80 %.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:445522
Date January 2021
CreatorsPalacková, Bianca
ContributorsHradiš, Michal, Kodym, Oldřich
PublisherVysoké učení technické v Brně. Fakulta informačních technologií
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0023 seconds