Spelling suggestions: "subject:"0ptical character recognition"" "subject:"0ptical haracter recognition""
101 |
Scale Invariant Object Recognition Using Cortical Computational Models and a Robotic PlatformVoils, Danny 01 January 2012 (has links)
This paper proposes an end-to-end, scale invariant, visual object recognition system, composed of computational components that mimic the cortex in the brain. The system uses a two stage process. The first stage is a filter that extracts scale invariant features from the visual field. The second stage uses inference based spacio-temporal analysis of these features to identify objects in the visual field. The proposed model combines Numenta's Hierarchical Temporal Memory (HTM), with HMAX developed by MIT's Brain and Cognitive Science Department. While these two biologically inspired paradigms are based on what is known about the visual cortex, HTM and HMAX tackle the overall object recognition problem from different directions. Image pyramid based methods like HMAX make explicit use of scale, but have no sense of time. HTM, on the other hand, only indirectly tackles scale, but makes explicit use of time. By combining HTM and HMAX, both scale and time are addressed. In this paper, I show that HTM and HMAX can be combined to make a com- plete cortex inspired object recognition model that explicitly uses both scale and time to recognize objects in temporal sequences of images. Additionally, through experimentation, I examine several variations of HMAX and its
|
102 |
Ensemble Methods for Historical Machine-Printed Document RecognitionLund, William B. 03 April 2014 (has links) (PDF)
The usefulness of digitized documents is directly related to the quality of the extracted text. Optical Character Recognition (OCR) has reached a point where well-formatted and clean machine- printed documents are easily recognizable by current commercial OCR products; however, older or degraded machine-printed documents present problems to OCR engines resulting in word error rates (WER) that severely limit either automated or manual use of the extracted text. Major archives of historical machine-printed documents are being assembled around the globe, requiring an accurate transcription of the text for the automated creation of descriptive metadata, full-text searching, and information extraction. Given document images to be transcribed, ensemble recognition methods with multiple sources of evidence from the original document image and information sources external to the document have been shown in this and related work to improve output. This research introduces new methods of evidence extraction, feature engineering, and evidence combination to correct errors from state-of-the-art OCR engines. This work also investigates the success and failure of ensemble methods in the OCR error correction task, as well as the conditions under which these ensemble recognition methods reduce the Word Error Rate (WER), improving the quality of the OCR transcription, showing that the average document word error rate can be reduced below the WER of a state-of-the-art commercial OCR system by between 7.4% and 28.6% depending on the test corpus and methods. This research on OCR error correction contributes within the larger field of ensemble methods as follows. Four unique corpora for OCR error correction are introduced: The Eisenhower Communiqués, a collection of typewritten documents from 1944 to 1945; The Nineteenth Century Mormon Articles Newspaper Index from 1831 to 1900; and two synthetic corpora based on the Enron (2001) and the Reuters (1997) datasets. The Reverse Dijkstra Heuristic is introduced as a novel admissible heuristic for the A* exact alignment algorithm. The impact of the heuristic is a dramatic reduction in the number of nodes processed during text alignment as compared to the baseline method. From the aligned text, the method developed here creates a lattice of competing hypotheses for word tokens. In contrast to much of the work in this field, the word token lattice is created from a character alignment, preserving split and merged tokens within the hypothesis columns of the lattice. This alignment method more explicitly identifies competing word hypotheses which may otherwise have been split apart by a word alignment. Lastly, this research explores, in order of increasing contribution to word error rate reduction: voting among hypotheses, decision lists based on an in-domain training set, ensemble recognition methods with novel feature sets, multiple binarizations of the same document image, and training on synthetic document images.
|
103 |
Benchmarking Object Detection Algorithms for Optical Character Recognition of Odometer MileageHjelm, Mandus, Andersson, Eric January 2022 (has links)
Machine learning algorithms have had breakthroughs in many areas in the last decades. The hardest task, to solve with machine learning, was solving tasks that humans solve intuitively, e.g. understanding natural language or recognizing specific objects in images. To overcome these problems is to allow the computer to learn from experience, instead of implementing a pre-written program to solve the problem at hand - that is how Neural Networks came to be. Neural Network is widely used in image analysis, and object detection algorithms have evolved considerably in the last years. Two of these algorithms are Faster Region-basedConvolutional Neural Networks(Faster R-CNN) and You Only Look Once(YOLO). The purpose of this thesis is to evaluate and benchmark state-of-the-art object detection methods and then analyze their performance based on reading information from images. The information that we aim to extract is digital and analog digits from the odometer of a car, this will be done through object recognition and region-based image analysis. Our models will be compared to the open-source Optical Character Recognition(OCR) model Tesseract, which is in production by the Stockholm-based company Greater Than. In this project we will take a more modern approach and focus on two object detection models, Faster R-CNN and YOLO. When training these models, we will use transfer learning. This means that we will use models that are pre-trained, in our case on a dataset called ImageNet, specifically for object detection. We will then use the TRODO dataset to train these models further, this dataset consists of 2 389 images of car odometers. The models are then evaluated through the measures of mean average precision(mAP), prediction accuracy, and Levenshtein Distance. Our findings are that the object detection models are out-performing Tesseract for all measurements. The highest mAP and accuracy is attained by Faster R-CNN while the best results, regarding Levenshtein distance, are achieved by a YOLO model. The final result is clear, both of our approaches have more diversity and are far better thanTesseract, for solving this specific problem.
|
104 |
Analogue meters in a digital world : Minimizing data size when offloading OCR processesDavidsson, Robin, Sjölander, Fredrik January 2022 (has links)
Introduction: Instead of replacing existing analogue water meters with Internet of Things (IoT) connected substitutes, an alternative would be to attach an IoT connected module to the analogue water meter that optically reads the meter value using Optical Character Recognition (OCR). Such a module would need to be battery-powered given that access to the electrical grid is typically limited near water meters. Research has shown that offloading the OCR process can reduce the power dissipation from the battery, and that this dissipation can be reduced even further by reducing the amount of data that is transmitted. Purpose: For the sake of minimising energy consumption in the proposed solution, the purpose of the study is to find out to what extent it is possible to reduce an input image’s file size by means of resolution, colour depth, and compression before the Google Cloud Vision OCR engine no longer returns feasible results. Method and implementation: 250 images of analogue water meter values were processed by the Google Vision Cloud OCR through 38 000 different combinations of resolution, colour depth, and upscaling. Results: The highest rate of successful OCR readings with a minimal file size were found among images within a range of resolutions between 133 x 22 to 163 x 27 pixels and colour depths between 1- and 2-bits/pixel. Conclusion: The study shows that there is a potential for minimising data sizes, and thereby energy consumption, by offloading the OCR process by means of transmitting images of minimal file size.
|
105 |
Computer Vision for Document Image Analysis and Text Extraction / Datorseende för analys av dokumentbilder och textutvinningBenchekroun, Omar January 2022 (has links)
Automatic document processing has been a subject of interest in the industry for the past few years, especially with the recent technological advances in Machine Learning and Computer Vision. This project investigates in-depth a major component used in Document Image Processing known as Optical Character Recognition (OCR). First, an improvement upon existing shallow CNN+LSTM is proposed, using domain-specific data synthesis. We demonstrate that this model can achieve an accuracy of up to 97% on non-handwritten text, with an accuracy improvement of 24% when using synthetic data. Furthermore, we deal with handwritten text that presents more challenges including the variance of writing style, slanting, and character ambiguity. A CNN+Transformer architecture is validated to recognize handwriting extracted from real-world insurance statements data. This model achieves a maximal accuracy of 92% on real-world data. Moreover, we demonstrate how a data pipeline relying on synthetic data can be a scalable and affordable solution for modern OCR needs. / Automatisk dokumenthantering har varit ett ämne av intresse i branschen under de senaste åren, särskilt med de senaste tekniska framstegen inom maskininlärning och datorseende. I detta projekt kommer man att på djupet undersöka en viktig komponent som används vid bildbehandling av dokument och som kallas optisk teckenigenkänning (OCR). Först kommer en förbättring av befintlig ytlig CNN+LSTM att föreslås, med hjälp av domänspecifik datasyntes. Vi kommer att visa att denna modell kan uppnå en noggrannhet på upp till 97% på icke handskriven text, med en förbättring av noggrannheten på 24% när syntetiska data används. Dessutom kommer vi att behandla handskriven text som innebär fler utmaningar, t.ex. variationer i skrivstilen, snedställningar och tvetydiga tecken. En CNN+Transformer-arkitektur kommer att valideras för att känna igen handskrift från verkliga data om försäkringsbesked. Denna modell uppnår en maximal noggrannhet på 92% på verkliga data. Dessutom kommer vi att visa hur en datapipeline som bygger på syntetiska data är en skalbar och prisvärd lösning för moderna OCR-behov.
|
106 |
Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing.Al-Muhtaseb, Husni A. January 2010 (has links)
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms.
This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems.
Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques.
Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time.
Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images.
In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected.
The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase.
Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%.
Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved.
To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. / King Fahd University of Petroleum and Minerals (KFUPM)
|
107 |
Exploring Machine Learning Solutions in the Context of OCR Post-Processing of Invoices / Utforskning av Maskininlärningslösningar för Optisk Teckenläsningsefterbehandling av FakturorDwyer, Jacob, Bertse, Sara January 2022 (has links)
Large corporations receive and send large volumes of invoices containing various fields detailing a transaction. Such fields include VAT, due date, total amount, etc. One common way to automatize invoice processing is optical character recognition (OCR). This technology entails automatic reading of characters from scanned images. One problem with invoices is that there is no universal layout standard. This creates difficulties when processing data from invoices with different layouts. This thesis aims to examine common errors in the output from Azure's Form Recognizer general document model and the ways in which machine learning (ML) can be used to solve the aforementioned problem, by providing error detection as a first step when classifying OCR output as correct or incorrect. To examine this, an analysis of common errors was made based on OCR output from 70 real invoices, and a Bidirectional Encoder Representations from Transformers (BERT) model was fine-tuned for invoice classification. The results show that the two most common OCR errors are: (i) extra words showing up in a field and (ii) words missing from a field. Together these two types of errors account for 51% of OCR errors. For correctness classification, a BERT type Transformer model yielded an F-score of 0.982 on fabricated data. On real invoice data, the initial model yielded an F-score of 0.596. After additional fine-tuning, the F-score was raised to 0.832. The results of this thesis show that ML, while not entirely reliable, may be a viable first step in assessment and correction of OCR errors for invoices. / Stora företag tar emot och skickar ut stora volymer fakturor innehållande olika fält med transaktionsdetaljer. Dessa fält inkluderar skattesats, förfallodatum, totalbelopp, osv. Ett vanligt sätt att automatisera fakturahantering är optisk teckenläsning. Denna teknologi innebär automatisk läsning av tecken från inskannade bilder. Ett problem med fakturor är att det saknas standardmall. Detta försvårar hanteringen av inläst data från fakturor med olika gränssnitt. Denna uppsats söker utforska vanliga fel i utmatningen från Azure's Form Recognizer general document model och sätten på vilka maskininlärning kan användas för att lösa nämnda problem, genom att förse feldetektering som ett första steg genom att klassificera optisk teckenläsningsutmatning som korrekt eller inkorrekt. För att undersöka detta gjordes en analys av vanligt förkommande fel i teckenläsningsutdata från 70 verkliga fakturor, och en BERT-modell finjusterades för klassificering av fakturor. Resultaten visar att de två vanligast förekommande optiska teckenläsningsfelen är:(i) att ovidkommande ord upptäcks i ett inläst värdefält och (ii) avsaknaden av ord i ett värdefält, vilka svarar för 51% av de optiska teckenläsningsfelen. För korrekthetsklassificeringen användes Transformermodellen BERT vilket gav ett F-värde på 0.98 för fabrikerad data. För data från verkliga fakturor var F-värdet 0.596 för den ursprungliga modellen. Efter ytterligare finjustering hamnade F-värdet på 0.832. Resultaten i denna uppsats visar att maskininlärning, om än inte fullt tillförlitligt, är ett gångbart första steg vid bedömning och korrigering av optiska teckenläsningsfel.
|
108 |
Sequence-to-Sequence Learning using Deep Learning for Optical Character Recognition (OCR)Mishra, Vishal Vijayshankar January 2017 (has links)
No description available.
|
109 |
Text Localization for Unmanned Ground VehiclesKirchhoff, Allan Richard 16 October 2014 (has links)
Unmanned ground vehicles (UGVs) are increasingly being used for civilian and military applications. Passive sensing, such as visible cameras, are being used for navigation and object detection. An additional object of interest in many environments is text. Text information can supplement the autonomy of unmanned ground vehicles. Text most often appears in the environment in the form of road signs and storefront signs. Road hazard information, unmapped route detours and traffic information are available to human drivers through road signs. Premade road maps lack these traffic details, but with text localization the vehicle could fill the information gaps. Leading text localization algorithms achieve ~60% accuracy; however, practical applications are cited to require at least 80% accuracy [49].
The goal of this thesis is to test existing text localization algorithms against challenging scenes, identify the best candidate and optimize it for scenes a UGV would encounter. Promising text localization methods were tested against a custom dataset created to best represent scenes a UGV would encounter. The dataset includes road signs and storefront signs against complex background. The methods tested were adaptive thresholding, the stroke filter and the stroke width transform. A temporal tracking proof of concept was also tested. It tracked text through a series of frames in order to reduce false positives.
Best results were obtained using the stroke width transform with temporal tracking which achieved an accuracy of 79%. That level of performance approaches requirements for use in practical applications. Without temporal tracking the stroke width transform yielded an accuracy of 46%. The runtime was 8.9 seconds per image, which is 44.5 times slower than necessary for real-time object tracking. Converting the MATLAB code to C++ and running the text localization on a GPU could provide the necessary speedup. / Master of Science
|
110 |
Sistema de reconocimiento de texto mecanografiado mediante redes neuronales para la gestión de boletas de pago en la Ugel FerreñafeBonilla Vilchez, Jonathan Alonso January 2024 (has links)
En este proyecto, se llevó a cabo un estudio con el objetivo de desarrollar un sistema de reconocimiento óptico de caracteres (OCR) diseñado para identificar y almacenar la información de las boletas de pago de docentes en la UGEL Ferreñafe. Esto se debió a la necesidad de agilizar la búsqueda de boletas en formato físico, un proceso que, en ocasiones, podía llevar semanas y requerir la contratación de personal adicional. Esta problemática impulsó la búsqueda de una solución eficaz y rentable.
Siguiendo las metodologías SCRUM y CRISP-DM, se optó por utilizar Redes Neuronales (RN) como la técnica principal. Esta elección se basó en investigaciones previas y tendencias identificadas en Google Trends. El objetivo fundamental era alcanzar un porcentaje de error bajo en la tasa de caracteres reconocidos, y se logró un hito significativo del 1.8%, a pesar de la degradación de la tinta en muchas boletas debido al paso del tiempo.
Para evaluar la usabilidad del sistema, se aplicó la escala SUS (System Usability Scale), y el sistema obtuvo una puntuación de 80, superando las expectativas iniciales. Esto resalta la alta usabilidad y satisfacción de los usuarios finales con la aplicación desarrollada. / In this project, a study was carried out with the objective of developing an optical character recognition (OCR) system designed to identify and store information from teacher pay slips at UGEL Ferreñafe. This was due to the need to expedite the search for physical ballots, a process that could sometimes take weeks and require the hiring of additional staff. This problem prompted the search for an effective and profitable solution.
Following the SCRUM and CRISP-DM methodologies, it was decided to use Neural Networks (RN) as the main technique. This choice was based on previous research and trends identified in Google Trends. The fundamental objective was to achieve a low error rate in the rate of recognized characters, and a significant milestone of 1.8% was achieved, despite the degradation of the ink on many ballots due to the passage of time.
To evaluate the usability of the system, the SUS scale (System Usability Scale) was applied, and the system obtained a score of 80, exceeding initial expectations. This highlights the high usability and satisfaction of end users with the developed application.
|
Page generated in 0.1113 seconds