Spelling suggestions: "subject:"handwriting recognition"" "subject:"andwriting recognition""
11 |
Intelligent Indexing: A Semi-Automated, Trainable System for Field LabelingClawson, Robert T 01 September 2014 (has links) (PDF)
We present Intelligent Indexing: a general, scalable, collaborative approach to indexing and transcription of non-machine-readable documents that exploits visual consensus and group labeling while harnessing human recognition and domain expertise. In our system, indexers work directly on the page, and with minimal context switching can navigate the page, enter labels, and interact with the recognition engine. Interaction with the recognition engine occurs through preview windows that allow the indexer to quickly verify and correct recommendations. This interaction is far superior to conventional, tedious, inefficient post-correction and editing. Intelligent Indexing is a trainable system that improves over time and can provide benefit even without prior knowledge. A user study was performed to compare Intelligent Indexing to a basic, manual indexing system. Volunteers report that using Intelligent Indexing is less mentally fatiguing and more enjoyable than the manual indexing system. Their results also show that it reduces significantly (30.2%) the time required to index census records, while maintaining comparable accuracy. A helpful video resource for learning more about this research is available on youtube through this link: https://www.youtube.com/watch?v=gqdVzEPnBEw
|
12 |
Freeform Cursive Handwriting Recognition Using a Clustered Neural NetworkBristow, Kelly H. 08 1900 (has links)
Optical character recognition (OCR) software has advanced greatly in recent years. Machine-printed text can be scanned and converted to searchable text with word accuracy rates around 98%. Reasonably neat hand-printed text can be recognized with about 85% word accuracy. However, cursive handwriting still remains a challenge, with state-of-the-art performance still around 75%. Algorithms based on hidden Markov models have been only moderately successful, while recurrent neural networks have delivered the best results to date. This thesis explored the feasibility of using a special type of feedforward neural network to convert freeform cursive handwriting to searchable text. The hidden nodes in this network were grouped into clusters, with each cluster being trained to recognize a unique character bigram. The network was trained on writing samples that were pre-segmented and annotated. Post-processing was facilitated in part by using the network to identify overlapping bigrams that were then linked together to form words and sentences. With dictionary assisted post-processing, the network achieved word accuracy of 66.5% on a small, proprietary corpus. The contributions in this thesis are threefold: 1) the novel clustered architecture of the feed-forward neural network, 2) the development of an expanded set of observers combining image masks, modifiers, and feature characterizations, and 3) the use of overlapping bigrams as the textual working unit to assist in context analysis and reconstruction.
|
13 |
Deep Neural Networks for Large Vocabulary Handwritten Text Recognition / Réseaux de Neurones Profonds pour la Reconnaissance de Texte Manucrit à Large VocabulaireBluche, Théodore 13 May 2015 (has links)
La transcription automatique du texte dans les documents manuscrits a de nombreuses applications, allant du traitement automatique des documents à leur indexation ou leur compréhension. L'une des approches les plus populaires de nos jours consiste à parcourir l'image d'une ligne de texte avec une fenêtre glissante, de laquelle un certain nombre de caractéristiques sont extraites, et modélisées par des Modèles de Markov Cachés (MMC). Quand ils sont associés à des réseaux de neurones, comme des Perceptrons Multi-Couches (PMC) ou Réseaux de Neurones Récurrents de type Longue Mémoire à Court Terme (RNR-LMCT), et à un modèle de langue, ces modèles produisent de bonnes transcriptions. D'autre part, dans de nombreuses applications d'apprentissage automatique, telles que la reconnaissance de la parole ou d'images, des réseaux de neurones profonds, comportant plusieurs couches cachées, ont récemment permis une réduction significative des taux d'erreur.Dans cette thèse, nous menons une étude poussée de différents aspects de modèles optiques basés sur des réseaux de neurones profonds dans le cadre de systèmes hybrides réseaux de neurones / MMC, dans le but de mieux comprendre et évaluer leur importance relative. Dans un premier temps, nous montrons que des réseaux de neurones profonds apportent des améliorations cohérentes et significatives par rapport à des réseaux ne comportant qu'une ou deux couches cachées, et ce quel que soit le type de réseau étudié, PMC ou RNR, et d'entrée du réseau, caractéristiques ou pixels. Nous montrons également que les réseaux de neurones utilisant les pixels directement ont des performances comparables à ceux utilisant des caractéristiques de plus haut niveau, et que la profondeur des réseaux est un élément important de la réduction de l'écart de performance entre ces deux types d'entrées, confirmant la théorie selon laquelle les réseaux profonds calculent des représentations pertinantes, de complexités croissantes, de leurs entrées, en apprenant les caractéristiques de façon automatique. Malgré la domination flagrante des RNR-LMCT dans les publications récentes en reconnaissance d'écriture manuscrite, nous montrons que des PMCs profonds atteignent des performances comparables. De plus, nous avons évalué plusieurs critères d'entrainement des réseaux. Avec un entrainement discriminant de séquences, nous reportons, pour des systèmes PMC/MMC, des améliorations comparables à celles observées en reconnaissance de la parole. Nous montrons également que la méthode de Classification Temporelle Connexionniste est particulièrement adaptée aux RNRs. Enfin, la technique du dropout a récemment été appliquée aux RNR. Nous avons testé son effet à différentes positions relatives aux connexions récurrentes des RNRs, et nous montrons l'importance du choix de ces positions.Nous avons mené nos expériences sur trois bases de données publiques, qui représentent deux langues (l'anglais et le français), et deux époques, en utilisant plusieurs types d'entrées pour les réseaux de neurones : des caractéristiques prédéfinies, et les simples valeurs de pixels. Nous avons validé notre approche en participant à la compétition HTRtS en 2014, où nous avons obtenu la deuxième place. Les résultats des systèmes présentés dans cette thèse, avec les deux types de réseaux de neurones et d'entrées, sont comparables à l'état de l'art sur les bases Rimes et IAM, et leur combinaison dépasse les meilleurs résultats publiés sur les trois bases considérées. / The automatic transcription of text in handwritten documents has many applications, from automatic document processing, to indexing and document understanding. One of the most popular approaches nowadays consists in scanning the text line image with a sliding window, from which features are extracted, and modeled by Hidden Markov Models (HMMs). Associated with neural networks, such as Multi-Layer Perceptrons (MLPs) or Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs), and with a language model, these models yield good transcriptions. On the other hand, in many machine learning applications, including speech recognition and computer vision, deep neural networks consisting of several hidden layers recently produced a significant reduction of error rates. In this thesis, we have conducted a thorough study of different aspects of optical models based on deep neural networks in the hybrid neural network / HMM scheme, in order to better understand and evaluate their relative importance. First, we show that deep neural networks produce consistent and significant improvements over networks with one or two hidden layers, independently of the kind of neural network, MLP or RNN, and of input, handcrafted features or pixels. Then, we show that deep neural networks with pixel inputs compete with those using handcrafted features, and that depth plays an important role in the reduction of the performance gap between the two kinds of inputs, supporting the idea that deep neural networks effectively build hierarchical and relevant representations of their inputs, and that features are automatically learnt on the way. Despite the dominance of LSTM-RNNs in the recent literature of handwriting recognition, we show that deep MLPs achieve comparable results. Moreover, we evaluated different training criteria. With sequence-discriminative training, we report similar improvements for MLP/HMMs as those observed in speech recognition. We also show how the Connectionist Temporal Classification framework is especially suited to RNNs. Finally, the novel dropout technique to regularize neural networks was recently applied to LSTM-RNNs. We tested its effect at different positions in LSTM-RNNs, thus extending previous works, and we show that its relative position to the recurrent connections is important. We conducted the experiments on three public databases, representing two languages (English and French) and two epochs, using different kinds of neural network inputs: handcrafted features and pixels. We validated our approach by taking part to the HTRtS contest in 2014. The results of the final systems presented in this thesis, namely MLPs and RNNs, with handcrafted feature or pixel inputs, are comparable to the state-of-the-art on Rimes and IAM. Moreover, the combination of these systems outperformed all published results on the considered databases.
|
14 |
Um framework para desenvolvimento de interfaces multimodais em aplicações de computação ubíqua / A framework for multimodal interfaces development in ubiquitous computing applicationsInacio Junior, Valter dos Reis 26 April 2007 (has links)
Interfaces multimodais processam vários tipos de entrada do usuário, tais como voz, gestos e interação com caneta, de uma maneira combinada e coordenada com a saída multimídia do sistema. Aplicações que suportam a multimodalidade provêem um modo mais natural e flexível para a execução de tarefas em computadores, uma vez que permitem que usuários com diferentes níveis de habilidades escolham o modo de interação que melhor se adequa às suas necessidades. O uso de interfaces que fogem do estilo convencional de interação baseado em teclado e mouse vai de encontro ao conceito de computação ubíqua, que tem se estabelecido como uma área de pesquisa que estuda os aspectos tecnológicos e sociais decorrentes da integração de sistemas e dispositivos computacionais à ambientes. Nesse contexto, o trabalho aqui reportado visou investigar a implementação de interfaces multimodais em aplicações de computação ubíqua, por meio da construção de um framework de software para integração de modalidades de escrita e voz / Multimodal interfaces process several types of user inputs, such as voice, gestures and pen interaction, in a combined and coordinated manner with the system?s multimedia output. Applications which support multimodality provide a more natural and flexible way for executing tasks with computers, since they allow users with different levels of abilities to choose the mode of interaction that best fits their needs. The use of interfaces that run away from the conventional style of interaction, based in keyboard and mouse, comes together with the concept of ubiquitous computing, which has been established as a research area that studies the social and technological aspects decurrent from the integration os systems and devices into the environments. In this context, the work reported here aimed to investigate the implementation of multimodal interfaces in ubiquitous computing applications, by means of the building of a software framework used for integrating handwriting and speech modalities
|
15 |
Um framework para desenvolvimento de interfaces multimodais em aplicações de computação ubíqua / A framework for multimodal interfaces development in ubiquitous computing applicationsValter dos Reis Inacio Junior 26 April 2007 (has links)
Interfaces multimodais processam vários tipos de entrada do usuário, tais como voz, gestos e interação com caneta, de uma maneira combinada e coordenada com a saída multimídia do sistema. Aplicações que suportam a multimodalidade provêem um modo mais natural e flexível para a execução de tarefas em computadores, uma vez que permitem que usuários com diferentes níveis de habilidades escolham o modo de interação que melhor se adequa às suas necessidades. O uso de interfaces que fogem do estilo convencional de interação baseado em teclado e mouse vai de encontro ao conceito de computação ubíqua, que tem se estabelecido como uma área de pesquisa que estuda os aspectos tecnológicos e sociais decorrentes da integração de sistemas e dispositivos computacionais à ambientes. Nesse contexto, o trabalho aqui reportado visou investigar a implementação de interfaces multimodais em aplicações de computação ubíqua, por meio da construção de um framework de software para integração de modalidades de escrita e voz / Multimodal interfaces process several types of user inputs, such as voice, gestures and pen interaction, in a combined and coordinated manner with the system?s multimedia output. Applications which support multimodality provide a more natural and flexible way for executing tasks with computers, since they allow users with different levels of abilities to choose the mode of interaction that best fits their needs. The use of interfaces that run away from the conventional style of interaction, based in keyboard and mouse, comes together with the concept of ubiquitous computing, which has been established as a research area that studies the social and technological aspects decurrent from the integration os systems and devices into the environments. In this context, the work reported here aimed to investigate the implementation of multimodal interfaces in ubiquitous computing applications, by means of the building of a software framework used for integrating handwriting and speech modalities
|
16 |
End-to-End Full-Page Handwriting RecognitionWigington, Curtis Michael 01 May 2018 (has links)
Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.
|
17 |
Zařízení pro napodobení statických a dynamických vlastností písma / Device for Imitation of Static and Dynamic Handwriting CharacteristicsPawlus, Jan January 2019 (has links)
This project deals with designing and assembling a system for imitation of static and dynamic handwriting characteristics. This system's design takes into account special pen targeted for getting the handwriting characteristics, working with these characteristics and their imitation with a 3D printer altered for this purpose. This topic might be interesting because research about this specific field, which would include a real demonstration of how a signature can be forged using dynamic handwriting characteristics not by forger's hand, barely exists. The problem with preventing forgery is that we need to know the attack well, which is a clear motivation for this project.
|
18 |
A Multitask Learning Encoder-N-Decoder Framework for Movie and Video DescriptionNina, Oliver A., Nina 11 October 2018 (has links)
No description available.
|
19 |
Deep Learning for Document Image AnalysisTensmeyer, Christopher Alan 01 April 2019 (has links)
Automatic machine understanding of documents from image inputs enables many applications in modern document workflows, digital archives of historical documents, and general machine intelligence, among others. Together, the techniques for understanding document images comprise the field of Document Image Analysis (DIA). Within DIA, the research community has identified several sub-problems, such as page segmentation and Optical Character Recognition (OCR). As the field has matured, there has been a trend of moving away from heuristic-based methods, designed for particular tasks and domains of documents, and moving towards machine learning methods that learn to solve tasks from examples of input/output pairs. Within machine learning, a particular class of models, known as deep learning models, have established themselves as the state-of-the-art for many image-based applications, including DIA. While traditional machine learning models typically operate on features designed by researchers, deep learning models are able to learn task-specific features directly from raw pixel inputs.This dissertation is collection of papers that proposes several deep learning models to solve a variety of tasks within DIA. The first task is historical document binarization, where an input image of a degraded historical document is converted to a bi-tonal image to separate foreground text from background regions. The next part of the dissertation considers document segmentation problems, including identifying the boundary between the document page and its background, as well as segmenting an image of a data table into rows, columns, and cells. Finally, a variety of deep models are proposed to solve recognition tasks. These tasks include whole document image classification, identifying the font of a given piece of text, and transcribing handwritten text in low-resource languages.
|
20 |
Mathematical Expression Recognition based on Probabilistic GrammarsÁlvaro Muñoz, Francisco 15 June 2015 (has links)
[EN] Mathematical notation is well-known and used all over the
world. Humankind has evolved from simple methods representing
countings to current well-defined math notation able to account for
complex problems. Furthermore, mathematical expressions constitute a
universal language in scientific fields, and many information
resources containing mathematics have been created during the last
decades. However, in order to efficiently access all that information,
scientific documents have to be digitized or produced directly in
electronic formats.
Although most people is able to understand and produce mathematical
information, introducing math expressions into electronic devices
requires learning specific notations or using editors. Automatic
recognition of mathematical expressions aims at filling this gap
between the knowledge of a person and the input accepted by
computers. This way, printed documents containing math expressions
could be automatically digitized, and handwriting could be used for
direct input of math notation into electronic devices.
This thesis is devoted to develop an approach for mathematical
expression recognition. In this document we propose an approach for
recognizing any type of mathematical expression (printed or
handwritten) based on probabilistic grammars. In order to do so, we
develop the formal statistical framework such that derives several
probability distributions. Along the document, we deal with the
definition and estimation of all these probabilistic sources of
information. Finally, we define the parsing algorithm that globally
computes the most probable mathematical expression for a given input
according to the statistical framework.
An important point in this study is to provide objective performance
evaluation and report results using public data and standard
metrics. We inspected the problems of automatic evaluation in this
field and looked for the best solutions. We also report several
experiments using public databases and we participated in several
international competitions. Furthermore, we have released most of the
software developed in this thesis as open source.
We also explore some of the applications of mathematical expression
recognition. In addition to the direct applications of transcription
and digitization, we report two important proposals. First, we
developed mucaptcha, a method to tell humans and computers apart by
means of math handwriting input, which represents a novel application
of math expression recognition. Second, we tackled the problem of
layout analysis of structured documents using the statistical
framework developed in this thesis, because both are two-dimensional
problems that can be modeled with probabilistic grammars.
The approach developed in this thesis for mathematical expression
recognition has obtained good results at different levels. It has
produced several scientific publications in international conferences
and journals, and has been awarded in international competitions. / [ES] La notación matemática es bien conocida y se utiliza en todo el
mundo. La humanidad ha evolucionado desde simples métodos para
representar cuentas hasta la notación formal actual capaz de modelar
problemas complejos. Además, las expresiones matemáticas constituyen
un idioma universal en el mundo científico, y se han creado muchos
recursos que contienen matemáticas durante las últimas décadas. Sin
embargo, para acceder de forma eficiente a toda esa información, los
documentos científicos han de ser digitalizados o producidos
directamente en formatos electrónicos.
Aunque la mayoría de personas es capaz de entender y producir
información matemática, introducir expresiones matemáticas en
dispositivos electrónicos requiere aprender notaciones especiales o
usar editores. El reconocimiento automático de expresiones matemáticas
tiene como objetivo llenar ese espacio existente entre el conocimiento
de una persona y la entrada que aceptan los ordenadores. De este modo,
documentos impresos que contienen fórmulas podrían digitalizarse
automáticamente, y la escritura se podría utilizar para introducir
directamente notación matemática en dispositivos electrónicos.
Esta tesis está centrada en desarrollar un método para reconocer
expresiones matemáticas. En este documento proponemos un método para
reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en
gramáticas probabilísticas. Para ello, desarrollamos el marco
estadístico formal que deriva varias distribuciones de probabilidad. A
lo largo del documento, abordamos la definición y estimación de todas
estas fuentes de información probabilística. Finalmente, definimos el
algoritmo que, dada cierta entrada, calcula globalmente la expresión
matemática más probable de acuerdo al marco estadístico.
Un aspecto importante de este trabajo es proporcionar una evaluación
objetiva de los resultados y presentarlos usando datos públicos y
medidas estándar. Por ello, estudiamos los problemas de la evaluación
automática en este campo y buscamos las mejores soluciones. Asimismo,
presentamos diversos experimentos usando bases de datos públicas y
hemos participado en varias competiciones internacionales. Además,
hemos publicado como código abierto la mayoría del software
desarrollado en esta tesis.
También hemos explorado algunas de las aplicaciones del reconocimiento
de expresiones matemáticas. Además de las aplicaciones directas de
transcripción y digitalización, presentamos dos propuestas
importantes. En primer lugar, desarrollamos mucaptcha, un método para
discriminar entre humanos y ordenadores mediante la escritura de
expresiones matemáticas, el cual representa una novedosa aplicación
del reconocimiento de fórmulas. En segundo lugar, abordamos el
problema de detectar y segmentar la estructura de documentos
utilizando el marco estadístico formal desarrollado en esta tesis,
dado que ambos son problemas bidimensionales que pueden modelarse con
gramáticas probabilísticas.
El método desarrollado en esta tesis para reconocer expresiones
matemáticas ha obtenido buenos resultados a diferentes niveles. Este
trabajo ha producido varias publicaciones en conferencias
internacionales y revistas, y ha sido premiado en competiciones
internacionales. / [CA] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La
humanitat ha evolucionat des de simples mètodes per representar
comptes fins a la notació formal actual capaç de modelar
problemes complexos. A més, les expressions matemàtiques
constitueixen un idioma universal al món científic, i s'han creat
molts recursos que contenen matemàtiques durant les últimes
dècades. No obstant això, per accedir de forma eficient a tota
aquesta informació, els documents científics han de ser
digitalitzats o produïts directament en formats electrònics.
Encara que la majoria de persones és capaç d'entendre i produir
informació matemàtica, introduir expressions matemàtiques en
dispositius electrònics requereix aprendre notacions especials o usar
editors. El reconeixement automàtic d'expressions matemàtiques
té per objectiu omplir aquest espai existent entre el coneixement
d'una persona i l'entrada que accepten els ordinadors. D'aquesta
manera, documents impresos que contenen fórmules podrien
digitalitzar-se automàticament, i l'escriptura es podria utilitzar per
introduir directament notació matemàtica en dispositius electrònics.
Aquesta tesi està centrada en desenvolupar un mètode per reconèixer
expressions matemàtiques. En aquest document proposem un mètode per
reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en
gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el
marc estadístic formal que deriva diverses distribucions de
probabilitat. Al llarg del document, abordem la definició i estimació
de totes aquestes fonts d'informació probabilística. Finalment,
definim l'algorisme que, donada certa entrada, calcula globalment
l'expressió matemàtica més probable d'acord al marc estadístic.
Un aspecte important d'aquest treball és proporcionar una avaluació
objectiva dels resultats i presentar-los usant dades públiques i
mesures estàndard. Per això, estudiem els problemes de l'avaluació
automàtica en aquest camp i busquem les millors solucions. Així
mateix, presentem diversos experiments usant bases de dades públiques
i hem participat en diverses competicions internacionals. A més, hem
publicat com a codi obert la majoria del software desenvolupat en
aquesta tesi.
També hem explorat algunes de les aplicacions del reconeixement
d'expressions matemàtiques. A més de les aplicacions directes de
transcripció i digitalització, presentem dues propostes
importants. En primer lloc, desenvolupem mucaptcha, un mètode per
discriminar entre humans i ordinadors mitjançant l'escriptura
d'expressions matemàtiques, el qual representa una nova aplicació del
reconeixement de fórmules. En segon lloc, abordem el problema de
detectar i segmentar l'estructura de documents utilitzant el marc
estadístic formal desenvolupat en aquesta tesi, donat que ambdós són
problemes bidimensionals que poden modelar-se amb gramàtiques
probabilístiques.
El mètode desenvolupat en aquesta tesi per reconèixer expressions
matemàtiques ha obtingut bons resultats a diferents nivells. Aquest
treball ha produït diverses publicacions en conferències
internacionals i revistes, i ha sigut premiat en competicions
internacionals. / Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665
|
Page generated in 0.1575 seconds