Global ETD Search

11	Intelligent Indexing: A Semi-Automated, Trainable System for Field Labeling Clawson, Robert T 01 September 2014 (has links) (PDF) We present Intelligent Indexing: a general, scalable, collaborative approach to indexing and transcription of non-machine-readable documents that exploits visual consensus and group labeling while harnessing human recognition and domain expertise. In our system, indexers work directly on the page, and with minimal context switching can navigate the page, enter labels, and interact with the recognition engine. Interaction with the recognition engine occurs through preview windows that allow the indexer to quickly verify and correct recommendations. This interaction is far superior to conventional, tedious, inefficient post-correction and editing. Intelligent Indexing is a trainable system that improves over time and can provide benefit even without prior knowledge. A user study was performed to compare Intelligent Indexing to a basic, manual indexing system. Volunteers report that using Intelligent Indexing is less mentally fatiguing and more enjoyable than the manual indexing system. Their results also show that it reduces significantly (30.2%) the time required to index census records, while maintaining comparable accuracy. A helpful video resource for learning more about this research is available on youtube through this link: https://www.youtube.com/watch?v=gqdVzEPnBEw historical document processing handwriting recognition indexing machine learning human-computer interaction Computer Sciences
12	Freeform Cursive Handwriting Recognition Using a Clustered Neural Network Bristow, Kelly H. 08 1900 (has links) Optical character recognition (OCR) software has advanced greatly in recent years. Machine-printed text can be scanned and converted to searchable text with word accuracy rates around 98%. Reasonably neat hand-printed text can be recognized with about 85% word accuracy. However, cursive handwriting still remains a challenge, with state-of-the-art performance still around 75%. Algorithms based on hidden Markov models have been only moderately successful, while recurrent neural networks have delivered the best results to date. This thesis explored the feasibility of using a special type of feedforward neural network to convert freeform cursive handwriting to searchable text. The hidden nodes in this network were grouped into clusters, with each cluster being trained to recognize a unique character bigram. The network was trained on writing samples that were pre-segmented and annotated. Post-processing was facilitated in part by using the network to identify overlapping bigrams that were then linked together to form words and sentences. With dictionary assisted post-processing, the network achieved word accuracy of 66.5% on a small, proprietary corpus. The contributions in this thesis are threefold: 1) the novel clustered architecture of the feed-forward neural network, 2) the development of an expanded set of observers combining image masks, modifiers, and feature characterizations, and 3) the use of overlapping bigrams as the textual working unit to assist in context analysis and reconstruction. Cursive handwriting recognition neural network OCR Optical character recognition. Neural networks (Computer science) Writing.
13	Deep Neural Networks for Large Vocabulary Handwritten Text Recognition / Réseaux de Neurones Profonds pour la Reconnaissance de Texte Manucrit à Large Vocabulaire Bluche, Théodore 13 May 2015 (has links) La transcription automatique du texte dans les documents manuscrits a de nombreuses applications, allant du traitement automatique des documents à leur indexation ou leur compréhension. L'une des approches les plus populaires de nos jours consiste à parcourir l'image d'une ligne de texte avec une fenêtre glissante, de laquelle un certain nombre de caractéristiques sont extraites, et modélisées par des Modèles de Markov Cachés (MMC). Quand ils sont associés à des réseaux de neurones, comme des Perceptrons Multi-Couches (PMC) ou Réseaux de Neurones Récurrents de type Longue Mémoire à Court Terme (RNR-LMCT), et à un modèle de langue, ces modèles produisent de bonnes transcriptions. D'autre part, dans de nombreuses applications d'apprentissage automatique, telles que la reconnaissance de la parole ou d'images, des réseaux de neurones profonds, comportant plusieurs couches cachées, ont récemment permis une réduction significative des taux d'erreur.Dans cette thèse, nous menons une étude poussée de différents aspects de modèles optiques basés sur des réseaux de neurones profonds dans le cadre de systèmes hybrides réseaux de neurones / MMC, dans le but de mieux comprendre et évaluer leur importance relative. Dans un premier temps, nous montrons que des réseaux de neurones profonds apportent des améliorations cohérentes et significatives par rapport à des réseaux ne comportant qu'une ou deux couches cachées, et ce quel que soit le type de réseau étudié, PMC ou RNR, et d'entrée du réseau, caractéristiques ou pixels. Nous montrons également que les réseaux de neurones utilisant les pixels directement ont des performances comparables à ceux utilisant des caractéristiques de plus haut niveau, et que la profondeur des réseaux est un élément important de la réduction de l'écart de performance entre ces deux types d'entrées, confirmant la théorie selon laquelle les réseaux profonds calculent des représentations pertinantes, de complexités croissantes, de leurs entrées, en apprenant les caractéristiques de façon automatique. Malgré la domination flagrante des RNR-LMCT dans les publications récentes en reconnaissance d'écriture manuscrite, nous montrons que des PMCs profonds atteignent des performances comparables. De plus, nous avons évalué plusieurs critères d'entrainement des réseaux. Avec un entrainement discriminant de séquences, nous reportons, pour des systèmes PMC/MMC, des améliorations comparables à celles observées en reconnaissance de la parole. Nous montrons également que la méthode de Classification Temporelle Connexionniste est particulièrement adaptée aux RNRs. Enfin, la technique du dropout a récemment été appliquée aux RNR. Nous avons testé son effet à différentes positions relatives aux connexions récurrentes des RNRs, et nous montrons l'importance du choix de ces positions.Nous avons mené nos expériences sur trois bases de données publiques, qui représentent deux langues (l'anglais et le français), et deux époques, en utilisant plusieurs types d'entrées pour les réseaux de neurones : des caractéristiques prédéfinies, et les simples valeurs de pixels. Nous avons validé notre approche en participant à la compétition HTRtS en 2014, où nous avons obtenu la deuxième place. Les résultats des systèmes présentés dans cette thèse, avec les deux types de réseaux de neurones et d'entrées, sont comparables à l'état de l'art sur les bases Rimes et IAM, et leur combinaison dépasse les meilleurs résultats publiés sur les trois bases considérées. / The automatic transcription of text in handwritten documents has many applications, from automatic document processing, to indexing and document understanding. One of the most popular approaches nowadays consists in scanning the text line image with a sliding window, from which features are extracted, and modeled by Hidden Markov Models (HMMs). Associated with neural networks, such as Multi-Layer Perceptrons (MLPs) or Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs), and with a language model, these models yield good transcriptions. On the other hand, in many machine learning applications, including speech recognition and computer vision, deep neural networks consisting of several hidden layers recently produced a significant reduction of error rates. In this thesis, we have conducted a thorough study of different aspects of optical models based on deep neural networks in the hybrid neural network / HMM scheme, in order to better understand and evaluate their relative importance. First, we show that deep neural networks produce consistent and significant improvements over networks with one or two hidden layers, independently of the kind of neural network, MLP or RNN, and of input, handcrafted features or pixels. Then, we show that deep neural networks with pixel inputs compete with those using handcrafted features, and that depth plays an important role in the reduction of the performance gap between the two kinds of inputs, supporting the idea that deep neural networks effectively build hierarchical and relevant representations of their inputs, and that features are automatically learnt on the way. Despite the dominance of LSTM-RNNs in the recent literature of handwriting recognition, we show that deep MLPs achieve comparable results. Moreover, we evaluated different training criteria. With sequence-discriminative training, we report similar improvements for MLP/HMMs as those observed in speech recognition. We also show how the Connectionist Temporal Classification framework is especially suited to RNNs. Finally, the novel dropout technique to regularize neural networks was recently applied to LSTM-RNNs. We tested its effect at different positions in LSTM-RNNs, thus extending previous works, and we show that its relative position to the recurrent connections is important. We conducted the experiments on three public databases, representing two languages (English and French) and two epochs, using different kinds of neural network inputs: handcrafted features and pixels. We validated our approach by taking part to the HTRtS contest in 2014. The results of the final systems presented in this thesis, namely MLPs and RNNs, with handcrafted feature or pixel inputs, are comparable to the state-of-the-art on Rimes and IAM. Moreover, the combination of these systems outperformed all published results on the considered databases. Reconnaissance de formes Modèles de Markov Cachés Réseaux de Neurones Reconnaissance de l'Ecriture Manuscrite Pattern Recognition Hidden Markov Models Neural Nerworks Handwriting Recognition
14	Um framework para desenvolvimento de interfaces multimodais em aplicações de computação ubíqua / A framework for multimodal interfaces development in ubiquitous computing applications Inacio Junior, Valter dos Reis 26 April 2007 (has links) Interfaces multimodais processam vários tipos de entrada do usuário, tais como voz, gestos e interação com caneta, de uma maneira combinada e coordenada com a saída multimídia do sistema. Aplicações que suportam a multimodalidade provêem um modo mais natural e flexível para a execução de tarefas em computadores, uma vez que permitem que usuários com diferentes níveis de habilidades escolham o modo de interação que melhor se adequa às suas necessidades. O uso de interfaces que fogem do estilo convencional de interação baseado em teclado e mouse vai de encontro ao conceito de computação ubíqua, que tem se estabelecido como uma área de pesquisa que estuda os aspectos tecnológicos e sociais decorrentes da integração de sistemas e dispositivos computacionais à ambientes. Nesse contexto, o trabalho aqui reportado visou investigar a implementação de interfaces multimodais em aplicações de computação ubíqua, por meio da construção de um framework de software para integração de modalidades de escrita e voz / Multimodal interfaces process several types of user inputs, such as voice, gestures and pen interaction, in a combined and coordinated manner with the system?s multimedia output. Applications which support multimodality provide a more natural and flexible way for executing tasks with computers, since they allow users with different levels of abilities to choose the mode of interaction that best fits their needs. The use of interfaces that run away from the conventional style of interaction, based in keyboard and mouse, comes together with the concept of ubiquitous computing, which has been established as a research area that studies the social and technological aspects decurrent from the integration os systems and devices into the environments. In this context, the work reported here aimed to investigate the implementation of multimodal interfaces in ubiquitous computing applications, by means of the building of a software framework used for integrating handwriting and speech modalities Computação ubíqua Framework Framework Handwriting recognition Interfaces multimodais Multimodal interfaces Reconhecimento de escrita Reconhecimento de voz Speech recognition Ubiquitous computing
15	Um framework para desenvolvimento de interfaces multimodais em aplicações de computação ubíqua / A framework for multimodal interfaces development in ubiquitous computing applications Valter dos Reis Inacio Junior 26 April 2007 (has links) Interfaces multimodais processam vários tipos de entrada do usuário, tais como voz, gestos e interação com caneta, de uma maneira combinada e coordenada com a saída multimídia do sistema. Aplicações que suportam a multimodalidade provêem um modo mais natural e flexível para a execução de tarefas em computadores, uma vez que permitem que usuários com diferentes níveis de habilidades escolham o modo de interação que melhor se adequa às suas necessidades. O uso de interfaces que fogem do estilo convencional de interação baseado em teclado e mouse vai de encontro ao conceito de computação ubíqua, que tem se estabelecido como uma área de pesquisa que estuda os aspectos tecnológicos e sociais decorrentes da integração de sistemas e dispositivos computacionais à ambientes. Nesse contexto, o trabalho aqui reportado visou investigar a implementação de interfaces multimodais em aplicações de computação ubíqua, por meio da construção de um framework de software para integração de modalidades de escrita e voz / Multimodal interfaces process several types of user inputs, such as voice, gestures and pen interaction, in a combined and coordinated manner with the system?s multimedia output. Applications which support multimodality provide a more natural and flexible way for executing tasks with computers, since they allow users with different levels of abilities to choose the mode of interaction that best fits their needs. The use of interfaces that run away from the conventional style of interaction, based in keyboard and mouse, comes together with the concept of ubiquitous computing, which has been established as a research area that studies the social and technological aspects decurrent from the integration os systems and devices into the environments. In this context, the work reported here aimed to investigate the implementation of multimodal interfaces in ubiquitous computing applications, by means of the building of a software framework used for integrating handwriting and speech modalities Computação ubíqua Framework Interfaces multimodais Reconhecimento de escrita Reconhecimento de voz Framework Handwriting recognition Multimodal interfaces Speech recognition Ubiquitous computing
16	End-to-End Full-Page Handwriting Recognition Wigington, Curtis Michael 01 May 2018 (has links) Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations. Handwriting Recognition Document Analysis Historical Document Processing Text Detection Text Line Segmentation Data Augmentation. Physical Sciences and Mathematics
17	Zařízení pro napodobení statických a dynamických vlastností písma / Device for Imitation of Static and Dynamic Handwriting Characteristics Pawlus, Jan January 2019 (has links) This project deals with designing and assembling a system for imitation of static and dynamic handwriting characteristics. This system's design takes into account special pen targeted for getting the handwriting characteristics, working with these characteristics and their imitation with a 3D printer altered for this purpose. This topic might be interesting because research about this specific field, which would include a real demonstration of how a signature can be forged using dynamic handwriting characteristics not by forger's hand, barely exists. The problem with preventing forgery is that we need to know the attack well, which is a clear motivation for this project.
18	A Multitask Learning Encoder-N-Decoder Framework for Movie and Video Description Nina, Oliver A., Nina 11 October 2018 (has links) No description available. Computer Science Computer Engineering Electrical Engineering
19	Deep Learning for Document Image Analysis Tensmeyer, Christopher Alan 01 April 2019 (has links) Automatic machine understanding of documents from image inputs enables many applications in modern document workflows, digital archives of historical documents, and general machine intelligence, among others. Together, the techniques for understanding document images comprise the field of Document Image Analysis (DIA). Within DIA, the research community has identified several sub-problems, such as page segmentation and Optical Character Recognition (OCR). As the field has matured, there has been a trend of moving away from heuristic-based methods, designed for particular tasks and domains of documents, and moving towards machine learning methods that learn to solve tasks from examples of input/output pairs. Within machine learning, a particular class of models, known as deep learning models, have established themselves as the state-of-the-art for many image-based applications, including DIA. While traditional machine learning models typically operate on features designed by researchers, deep learning models are able to learn task-specific features directly from raw pixel inputs.This dissertation is collection of papers that proposes several deep learning models to solve a variety of tasks within DIA. The first task is historical document binarization, where an input image of a degraded historical document is converted to a bi-tonal image to separate foreground text from background regions. The next part of the dissertation considers document segmentation problems, including identifying the boundary between the document page and its background, as well as segmenting an image of a data table into rows, columns, and cells. Finally, a variety of deep models are proposed to solve recognition tasks. These tasks include whole document image classification, identifying the font of a given piece of text, and transcribing handwritten text in low-resource languages. document image analysis deep learning binarization document classification font recognition handwriting recognition Computer Sciences Physical Sciences and Mathematics
20	Mathematical Expression Recognition based on Probabilistic Grammars Álvaro Muñoz, Francisco 15 June 2015 (has links) [EN] Mathematical notation is well-known and used all over the world. Humankind has evolved from simple methods representing countings to current well-defined math notation able to account for complex problems. Furthermore, mathematical expressions constitute a universal language in scientific fields, and many information resources containing mathematics have been created during the last decades. However, in order to efficiently access all that information, scientific documents have to be digitized or produced directly in electronic formats. Although most people is able to understand and produce mathematical information, introducing math expressions into electronic devices requires learning specific notations or using editors. Automatic recognition of mathematical expressions aims at filling this gap between the knowledge of a person and the input accepted by computers. This way, printed documents containing math expressions could be automatically digitized, and handwriting could be used for direct input of math notation into electronic devices. This thesis is devoted to develop an approach for mathematical expression recognition. In this document we propose an approach for recognizing any type of mathematical expression (printed or handwritten) based on probabilistic grammars. In order to do so, we develop the formal statistical framework such that derives several probability distributions. Along the document, we deal with the definition and estimation of all these probabilistic sources of information. Finally, we define the parsing algorithm that globally computes the most probable mathematical expression for a given input according to the statistical framework. An important point in this study is to provide objective performance evaluation and report results using public data and standard metrics. We inspected the problems of automatic evaluation in this field and looked for the best solutions. We also report several experiments using public databases and we participated in several international competitions. Furthermore, we have released most of the software developed in this thesis as open source. We also explore some of the applications of mathematical expression recognition. In addition to the direct applications of transcription and digitization, we report two important proposals. First, we developed mucaptcha, a method to tell humans and computers apart by means of math handwriting input, which represents a novel application of math expression recognition. Second, we tackled the problem of layout analysis of structured documents using the statistical framework developed in this thesis, because both are two-dimensional problems that can be modeled with probabilistic grammars. The approach developed in this thesis for mathematical expression recognition has obtained good results at different levels. It has produced several scientific publications in international conferences and journals, and has been awarded in international competitions. / [ES] La notación matemática es bien conocida y se utiliza en todo el mundo. La humanidad ha evolucionado desde simples métodos para representar cuentas hasta la notación formal actual capaz de modelar problemas complejos. Además, las expresiones matemáticas constituyen un idioma universal en el mundo científico, y se han creado muchos recursos que contienen matemáticas durante las últimas décadas. Sin embargo, para acceder de forma eficiente a toda esa información, los documentos científicos han de ser digitalizados o producidos directamente en formatos electrónicos. Aunque la mayoría de personas es capaz de entender y producir información matemática, introducir expresiones matemáticas en dispositivos electrónicos requiere aprender notaciones especiales o usar editores. El reconocimiento automático de expresiones matemáticas tiene como objetivo llenar ese espacio existente entre el conocimiento de una persona y la entrada que aceptan los ordenadores. De este modo, documentos impresos que contienen fórmulas podrían digitalizarse automáticamente, y la escritura se podría utilizar para introducir directamente notación matemática en dispositivos electrónicos. Esta tesis está centrada en desarrollar un método para reconocer expresiones matemáticas. En este documento proponemos un método para reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en gramáticas probabilísticas. Para ello, desarrollamos el marco estadístico formal que deriva varias distribuciones de probabilidad. A lo largo del documento, abordamos la definición y estimación de todas estas fuentes de información probabilística. Finalmente, definimos el algoritmo que, dada cierta entrada, calcula globalmente la expresión matemática más probable de acuerdo al marco estadístico. Un aspecto importante de este trabajo es proporcionar una evaluación objetiva de los resultados y presentarlos usando datos públicos y medidas estándar. Por ello, estudiamos los problemas de la evaluación automática en este campo y buscamos las mejores soluciones. Asimismo, presentamos diversos experimentos usando bases de datos públicas y hemos participado en varias competiciones internacionales. Además, hemos publicado como código abierto la mayoría del software desarrollado en esta tesis. También hemos explorado algunas de las aplicaciones del reconocimiento de expresiones matemáticas. Además de las aplicaciones directas de transcripción y digitalización, presentamos dos propuestas importantes. En primer lugar, desarrollamos mucaptcha, un método para discriminar entre humanos y ordenadores mediante la escritura de expresiones matemáticas, el cual representa una novedosa aplicación del reconocimiento de fórmulas. En segundo lugar, abordamos el problema de detectar y segmentar la estructura de documentos utilizando el marco estadístico formal desarrollado en esta tesis, dado que ambos son problemas bidimensionales que pueden modelarse con gramáticas probabilísticas. El método desarrollado en esta tesis para reconocer expresiones matemáticas ha obtenido buenos resultados a diferentes niveles. Este trabajo ha producido varias publicaciones en conferencias internacionales y revistas, y ha sido premiado en competiciones internacionales. / [CA] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La humanitat ha evolucionat des de simples mètodes per representar comptes fins a la notació formal actual capaç de modelar problemes complexos. A més, les expressions matemàtiques constitueixen un idioma universal al món científic, i s'han creat molts recursos que contenen matemàtiques durant les últimes dècades. No obstant això, per accedir de forma eficient a tota aquesta informació, els documents científics han de ser digitalitzats o produïts directament en formats electrònics. Encara que la majoria de persones és capaç d'entendre i produir informació matemàtica, introduir expressions matemàtiques en dispositius electrònics requereix aprendre notacions especials o usar editors. El reconeixement automàtic d'expressions matemàtiques té per objectiu omplir aquest espai existent entre el coneixement d'una persona i l'entrada que accepten els ordinadors. D'aquesta manera, documents impresos que contenen fórmules podrien digitalitzar-se automàticament, i l'escriptura es podria utilitzar per introduir directament notació matemàtica en dispositius electrònics. Aquesta tesi està centrada en desenvolupar un mètode per reconèixer expressions matemàtiques. En aquest document proposem un mètode per reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el marc estadístic formal que deriva diverses distribucions de probabilitat. Al llarg del document, abordem la definició i estimació de totes aquestes fonts d'informació probabilística. Finalment, definim l'algorisme que, donada certa entrada, calcula globalment l'expressió matemàtica més probable d'acord al marc estadístic. Un aspecte important d'aquest treball és proporcionar una avaluació objectiva dels resultats i presentar-los usant dades públiques i mesures estàndard. Per això, estudiem els problemes de l'avaluació automàtica en aquest camp i busquem les millors solucions. Així mateix, presentem diversos experiments usant bases de dades públiques i hem participat en diverses competicions internacionals. A més, hem publicat com a codi obert la majoria del software desenvolupat en aquesta tesi. També hem explorat algunes de les aplicacions del reconeixement d'expressions matemàtiques. A més de les aplicacions directes de transcripció i digitalització, presentem dues propostes importants. En primer lloc, desenvolupem mucaptcha, un mètode per discriminar entre humans i ordinadors mitjançant l'escriptura d'expressions matemàtiques, el qual representa una nova aplicació del reconeixement de fórmules. En segon lloc, abordem el problema de detectar i segmentar l'estructura de documents utilitzant el marc estadístic formal desenvolupat en aquesta tesi, donat que ambdós són problemes bidimensionals que poden modelar-se amb gramàtiques probabilístiques. El mètode desenvolupat en aquesta tesi per reconèixer expressions matemàtiques ha obtingut bons resultats a diferents nivells. Aquest treball ha produït diverses publicacions en conferències internacionals i revistes, i ha sigut premiat en competicions internacionals. / Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665 Mathematical expressions Handwriting recognition Text recognition Probabilistic grammars Symbol recognition Pattern recognition Human language technology LENGUAJES Y SISTEMAS INFORMATICOS

Search results