Global ETD Search

Return to search

Unified detection and recognition for reading text in scene images

Although an automated reader for the blind first appeared nearly two-hundred years ago, computers can currently "read" document text about as well as a seven-year-old. Scene text recognition brings many new challenges. A central limitation of current approaches is a feed-forward, bottom-up, pipelined architecture that isolates the many tasks and information involved in reading. The result is a system that commits errors from which it cannot recover and has components that lack access to relevant information. We propose a system for scene text reading that in its design, training, and operation is more integrated. First, we present a simple contextual model for text detection that is ignorant of any recognition. Through the use of special features and data context, this model performs well on the detection task, but limitations remain due to the lack of interpretation. We then introduce a recognition model that integrates several information sources, including font consistency and a lexicon, and compare it to approaches using pipelined architectures with similar information. Next we examine a more unified detection and recognition framework where features are selected based on the joint task of detection and recognition, rather than each task individually. This approach yields better results with fewer features. Finally, we demonstrate a model that incorporates segmentation and recognition at both the character and word levels. Text with difficult layouts and low resolution are more accurately recognized by this integrated approach. By more tightly coupling several aspects of detection and recognition, we hope to establish a new unified way of approaching the problem that will lead to improved performance. We would like computers to become accomplished grammar-school level readers.

https://scholarworks.umass.edu/dissertations/AAI3325128

Artificial intelligence|Computer science

Identifer	oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-5160
Date	01 January 2008
Creators	Weinman, Jerod J
Publisher	ScholarWorks@UMass Amherst
Source Sets	University of Massachusetts, Amherst
Language	English
Detected Language	English
Type	text
Source	Doctoral Dissertations Available from Proquest

Page generated in 0.0061 seconds

Unified detection and recognition for reading text in scene images

Description

Links & Downloads

Tags

Additional Fields