Global ETD Search

1	Sistema de reconhecimento de palavras manuscritas dependente do usuário. / User-defined handwriting recognition system. VELOSO, Luciana Ribeiro. 14 August 2018 (has links) Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-08-14T17:31:43Z No. of bitstreams: 1 LUCIANA RIBEIRO VELOSO - TESE PPGEE 2009..pdf: 1635341 bytes, checksum: 2d73699d44711c0cc83e60f235f32c94 (MD5) / Made available in DSpace on 2018-08-14T17:31:43Z (GMT). No. of bitstreams: 1 LUCIANA RIBEIRO VELOSO - TESE PPGEE 2009..pdf: 1635341 bytes, checksum: 2d73699d44711c0cc83e60f235f32c94 (MD5) Previous issue date: 2009-03 / Este trabalho apresenta um sistema de reconhecimento de palavras manuscritas isoladas dependente do escritor. Este sistema caracteriza-se por utilizar uma etapa de pré-processamento, que visa corrigir imperfeições e normalizar variações na imagem da palavra manuscrita, uma etapa de segmentação explícita, que visa dividir a palavra em caracteres ou segmentos de caracteres, uma etapa de extração de características, que tem por ﬁnalidade representar a imagem por três vetores de características (perceptivas, globais e direcionais) e um módulo de quantização vetorial, que tem o objetivo de realizar o mapeamento de um vetor de características em um vetor de observação (ou vetor de símbolos). Os símbolos correspondem aos índices (dos vetores-código) gerados na representação (quantização vetorial) da sequência de características com o uso dos dicionários. Finalizando, tem-se a etapa de classiﬁcação realizada por Modelos Escondidos de Markov, na qual os caracteres são reconhecidos individualmente e combinados para formar a palavra. Testes experimentais foram realizados com uma base de dados construída especiﬁcamente para este ﬁm, contendo amostras de manuscritos de4escritoresdistintos. Osistemadereconhecimentodepalavrasmanuscritasisoladas dependente do escritor obteve taxas de reconhecimento que variaram entre 83,31% a 92,96% dependendo do escritor analisado. Os resultados apresentados mostram que o sistema apresenta um ótimo desempenho quando utilizado para reconhecer palavras através dos modelos de caracteres. / This work presents a writer-dependent system for isolated handwritten cursive word recognition. This system is characterized by the utilization of a pre-processing state, which corrects imperfections and normalizes variations in the word image, an explicit segmentation stage, which splits the word into characters or character segments, a feature extraction stage, which represents the image by three feature vectors (perceptive, global and directional features), and a vector quantization module, which performs the mapping of a feature vector into an observation vector (or symbols vector). The symbols correspond to indices (the code vectors) generated by the representation (vector quantization) of the feature sequences with the use of dictionaries. Finally, there is the classiﬁcation stage, performed by Hidden Markov Models, where characters are individually recognized and combined to form a valid word. Experimental tests were conducted with a database speciﬁcally built for this problem, containing samples of manuscripts from 4 diﬀerent writers. The writer-dependent system for isolated handwritten cursive word recognition was recognition rate between 83.31% and 92.96% depending writer analyzed. The results show that the system oﬀers optimum performance when used word recognize by the characters models. Ciência da Computação. Engenharia Elétrica. Reconhecimento de padrões Reconhecimento de palavras manuscritas Análise de documentos Processamento digital de imagem Recognition of handwritten words Pattern recognition
2	Lexicon-Free Recognition Strategies For Online Handwritten Tamil Words Sundaram, Suresh 12 1900 (has links) (PDF) In this thesis, we address some of the challenges involved in developing a robust writer-independent, lexicon-free system to recognize online Tamil words. Tamil, being a Dravidian language, is morphologically rich and also agglutinative and thus does not have a finite lexicon. For example, a single verb root can easily lead to hundreds of words after morphological changes and agglutination. Further, adoption of a lexicon-free recognition approach can be applied to form-filling applications, wherein the lexicon can become cumbersome (if not impossible) to capture all possible names. Under such circumstances, one must necessarily explore the possibility of segmenting a Tamil word to its individual symbols. Modern day Tamil alphabet comprises 23 consonants and 11 vowels forming a total combination of 313 characters/aksharas. A minimal set of 155 distinct symbols have been derived to recognize these characters. A corpus of isolated Tamil symbols (IWFHR database) is used for deriving the various statistics proposed in this work. To address the challenges of segmentation and recognition (the primary focus of the thesis), Tamil words are collected using a custom application running on a tablet PC. A set of 10000 words (comprising 53246 symbols) have been collected from high school students and used for the experiments in this thesis. We refer to this database as the ‘MILE word database’. In the first part of the work, a feedback based word segmentation mechanism has been proposed. Initially, the Tamil word is segmented based on a bounding box overlap criterion. This dominant overlap criterion segmentation (DOCS) generates a set of candidate stroke groups. Thereafter, attention is paid to certain attributes from the resulting stroke groups for detecting any possible splits or under-segmentations. By relying on feedbacks provided by a priori knowledge of attributes such as number of dominant points and inter-stroke displacements the recognition label and likelihood of the primary SVM classifier linguistic knowledge on the detected stroke groups, a decision is taken to correct it or not. Accordingly, we call the proposed segmentation as ‘attention feedback segmentation’ (AFS). Across the words in the MILE word database, a segmentation rate of 99.7% is achieved at symbol level with AFS. The high segmentation rate (with feedback) in turn improves the symbol recognition rate of the primary SVM classifier from 83.9% (with DOCS alone) to 88.4%. For addressing the problem of segmentation, the SVM classifier fed with the x-y trace of the normalized and resampled online stroke groups is quite effective. However, the performance of the classifier is not robust to effectively distinguish between many sets of similar looking symbols. In order to improve the symbol recognition performance, we explore two approaches, namely reevaluation strategies and language models. The reevaluation techniques, in particular, resolve the ambiguities in base consonants, pure consonants and vowel modifiers to a considerable extent. For the frequently confused sets (derived from the confusion matrix), a dynamic time warping (DTW) approach is proposed to automatically extract their discriminative regions. Dedicated to each confusion set, novel localized cues are derived from the discriminative region for their disambiguation. The proposed features are quite promising in improving the symbol recognition performance of the confusion sets. Comparative experimental analysis of these features with x-y coordinates are performed for judging their discriminative power. The resolving of confusions is accomplished with expert networks, comprising discriminative region extractor, feature extractor and SVM. The proposed techniques improve the symbol recognition rate by 3.5% (from 88.4% to 91.9%) on the MILE word database over the primary SVM classifier. In the final part of the thesis, we integrate linguistic knowledge (derived from a text corpus) in the primary recognition system. The biclass, bigram and unigram language models at symbol level are compared in terms of recognition performance. Amongst the three models, the bigram model is shown to give the highest recognition accuracy. A class reduction approach for recognition is adopted by incorporating the language bigram model at the akshara level. Lastly, a judicious combination of reevaluation techniques with language models is proposed in this work. Overall, an improvement of up to 4.7% (from 88.4% to 93.1%) in symbol level accuracy is achieved. The writer-independent and lexicon-free segmentation-recognition approach developed in this thesis for online handwritten Tamil word recognition is promising. The best performance of 93.1% (achieved at symbol level) is comparable to the highest reported accuracy in the literature for Tamil symbols. However, the latter one is on a database of isolated symbols (IWFHR competition test dataset), whereas our accuracy is on a database of 10000 words and thus, a product of segmentation and classifier accuracies. The recognition performance obtained may be enhanced further by experimenting on and choosing the best set of features and classifiers. Also, the word recognition performance can be very significantly improved by using a lexicon. However, these are not the issues addressed by the thesis. We hope that the lexicon-free experiments reported in this work will serve as a benchmark for future efforts. Online Tamil Word Recognition Online Handwriting Recognition Online Tamil Words - Segmentation Online Tamil Symbols - Reevaluation Tamil Word Recognition - Language Models Handwritten Recognition Online Handwritten Words Online Handwritten Indic Words Computer Science

Search results

Sistema de reconhecimento de palavras manuscritas dependente do usuário. / User-defined handwriting recognition system.

Lexicon-Free Recognition Strategies For Online Handwritten Tamil Words