Return to search

Cursive Bengali Script Recognition for Indian Postal Automation

Large variations in writing styles and difficulties in segmenting cursive words are the main reasons for handwritten cursive words recognition for being such a challenging task. An Indian postal document reading system based on a segmentation-free context based stochastic model is presented. The originality of the work resides on a combination of high-level perceptual features with the low-level pixel information considered by the former model and a pruning strategy in the Viterbi decoding to reduce the recognition time. While the low-level information can be easily extracted from the analyzed form, the discriminative power of such information has some limits as describes the shape with less precision. For that reason, we have considered in the framework of an analytical approach, using an implicit segmentation, the implant of high-level information reduced to a lower level. This enrichment can be perceived as a weight at pixel level, assigning an importance to each analyzed pixel based on their perceptual properties. The challenge is to combine the different type of features considering a certain dependence between them. To reduce the decoding time in the Viterbi search, a cumulative threshold mechanism is proposed in a flat lexicon representation. Instead of using a trie representation where the common prefix parts are shared we propose a threshold mechanism in the flat lexicon where based just on a partial Viterbi analysis, we can prune a model and stop the further processing. The cumulative thresholds are based on matching scores calculated at each letter level, allowing a certain dynamic and elasticity to the model. As we are interested in a complete postal address recognition system, we have also focused our attention on digit recognition, proposing different neural and stochastic solutions. To increase the accuracy and robustness of the classifiers a combination scheme is also proposed. The results obtained on different datasets written on Latin and Bengali scripts have shown the interest of the method and the recognition module developed will be integrated in a generic system for the Indian postal automation.

Identiferoai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00579806
Date12 November 2008
CreatorsVajda, Szilárd
PublisherUniversité Henri Poincaré - Nancy I
Source SetsCCSD theses-EN-ligne, France
LanguageEnglish
Detected LanguageEnglish
TypePhD thesis

Page generated in 0.002 seconds