Global ETD Search

Return to search

Mathematical Expression Detection and Segmentation in Document Images

Various document layout analysis techniques are employed in order to enhance the accuracy of optical character recognition (OCR) in document images. Type-specific document layout analysis involves localizing and segmenting specific zones in an image so that they may be recognized by specialized OCR modules. Zones of interest include titles, headers/footers, paragraphs, images, mathematical expressions, chemical equations, musical notations, tables, circuit diagrams, among others. False positive/negative detections, oversegmentations, and undersegmentations made during the detection and segmentation stage will confuse a specialized OCR system and thus may result in garbled, incoherent output. In this work a mathematical expression detection and segmentation (MEDS) module is implemented and then thoroughly evaluated. The module is fully integrated with the open source OCR software, Tesseract, and is designed to function as a component of it. Evaluation is carried out on freely available public domain images so that future and existing techniques may be objectively compared. / Master of Science

document layout analysis

optical character recognition

document image

type-specific layout analysis

Identifer	oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/46724
Date	19 March 2014
Creators	Bruce, Jacob Robert
Contributors	Electrical and Computer Engineering, Abbott, A. Lynn, Hsiao, Michael S., Xuan, Jianhua
Publisher	Virginia Tech
Source Sets	Virginia Tech Theses and Dissertation
Detected Language	English
Type	Thesis
Format	ETD, application/pdf, application/pdf
Rights	In Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0025 seconds

Mathematical Expression Detection and Segmentation in Document Images

Description

Links & Downloads

Tags

Additional Fields