Global ETD Search

1	Underwater Document Recognition Shah, Jaimin Nitesh 18 May 2021 (has links) No description available. Computer Science image denoising image quality assessment - IQA optical character recognition - OCR
2	Ocr: A Statistical Model Of Multi-engine Ocr Systems McDonald, Mercedes Terre 01 January 2004 (has links) This thesis is a benchmark performed on three commercial Optical Character Recognition (OCR) engines. The purpose of this benchmark is to characterize the performance of the OCR engines with emphasis on the correlation of errors between each engine. The benchmarks are performed for the evaluation of the effect of a multi-OCR system employing a voting scheme to increase overall recognition accuracy. This is desirable since currently OCR systems are still unable to recognize characters with 100% accuracy. The existing error rates of OCR engines pose a major problem for applications where a single error can possibly effect significant outcomes, such as in legal applications. The results obtained from this benchmark are the primary determining factor in the decision of implementing a voting scheme. The experiment performed displayed a very high accuracy rate for each of these commercial OCR engines. The average accuracy rate found for each engine was near 99.5% based on a less than 6,000 word document. While these error rates are very low, the goal is 100% accuracy in legal applications. Based on the work in this thesis, it has been determined that a simple voting scheme will help to improve the accuracy rate. Character recognition accuracy Machine readability Optical character recognition (OCR) Voting scheme Electrical and Computer Engineering Engineering
3	Retrofitting analogue meters with smart devices : A feasibility study of local OCR processes on an energy critical driven system Andreasson, Joel, Ehrenbåge, Elin January 2023 (has links) Internet of Things (IoT) are becoming increasingly popular replacements for their analogue counterparts. However, there is still demand to keep analogue equipment that is already installed, while also having automated monitoring of the equipment, such as analogue water meters. A proposed solution for this problem is to install a battery powered add-on component that can optically read meter values using Optical Character Recognition (OCR) and transmit the readings wirelessly. Two ways to do this could be to either offload the OCR process to a server, or to do the OCR processing locally on the add-on component. Since water meters are often located where reception is weak and the add-on component is battery powered, a suitable technology for data transmission could be Long Range (LoRa) because of its low-power and long-range capabilities. Since LoRa has low transfer rate there is a need to keep data transfers small in size, which could make offloading a less favorable alternative compared to local OCR processing. The purpose of this thesis is therefore to research the feasibility, in terms of energy efficiency, of doing local OCR processing on the add-on component. The feasibility condition of this study is defined as being able to continually read an analogue meter for a 10-year lifespan, while consuming under 2600 milliampere hours (mAh) of energy. The two OCR algorithms developed for this study are a specialized OCR algorithm that utilizes pattern matching principles, and a Sum of Absolute Differences (SAD) OCR algorithm. These two algorithms have been compared against each other, to determine which one is more suitable for the system. This comparison yielded that the SAD algorithm was more suitable, and was then studied further by using different image resolutions and settings to determine if it was possible to further reduce energy consumption. The results showed that it was possible to significantly reduce energy consumption by reducing the image resolution. The study also researched the possibility of reducing energy consumption further by not reading all digits on the tested water meter, depending on the measuring frequency and water flow. The study concluded that OCR processing is feasible on an energy critical driven system when reading analouge meters, depending on the measuring frequency. Analogue meters Energy efficiency Internet of Things (IoT) Optical Character Recognition (OCR) Sum of Absolute Differences (SAD) Template matching Computer Engineering Datorteknik
4	Complex Document Parsing with Vision Language Models Yifei Hu (9193709) 17 December 2024 (has links) <p dir="ltr">This thesis explores the application of vision language models (VLMs) on document layout analysis (DLA) and optical character recognition (OCR). For document layout analysis, we found that VLMs excel at detecting text areas by leveraging their understanding of textual content, rather than relying solely on visual features. This approach proves more robust than traditional object detection methods, particularly for text-rich images typical in document analysis tasks. In addressing OCR challenges, we identified a critical bottleneck: the lack of high-quality, document-level OCR datasets. To overcome this limitation, we developed a novel synthetic data generation pipeline. This pipeline utilizes Large Language Models to create OCR training data by rendering markdown source text into images. Our experiments show that VLMs trained on this synthetic data outperform models trained on conventional datasets. This research highlights the potential of VLMs in document understanding tasks and introduces an innovative approach to generating training data for OCR. Our findings suggest that leveraging the dual image-text understanding capabilities of VLMs, combined with strategically generated synthetic data, can significantly advance the state of the art in document layout analysis and OCR.</p> Natural language processing Parsing Information Optical Character Recognition (OCR) layout analysis
5	Arabic text recognition of printed manuscripts : efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing Al-Muhtaseb, Husni Abdulghani January 2010 (has links) Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. 005.3
6	A Book Reader Design for Persons with Visual Impairment and Blindness Galarza, Luis E. 16 November 2017 (has links) The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments. Book reader curvature correction time of flight (ToF) device depth map digitization of text optical character recognition (OCR) assistive technology Electrical and Computer Engineering Signal Processing
7	A Possibilistic Approach To Handwritten Script Identification Via Morphological Methods For Pattern Representation Ghosh, Debashis 04 1900 (has links) (PDF) No description available. Morphology (Linguistics) Manuscripts - Morphology (Linguistics) Manuscripts - Pattern Analysis Optical Character Recognition (OCR) Pattern Recognition Vector Quantization (VQ) Handwritten Character Recognition Clustering Algorithms Script Recognition Computer Science.
8	Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing. Al-Muhtaseb, Husni A. January 2010 (has links) Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. / King Fahd University of Petroleum and Minerals (KFUPM) Arabic text recognition Hidden Markov Models Feature extraction Omni font recognition Minimal Arabic script Bigram Statistical Language Model Optical character recognition (OCR) Statistical and analytical analysis
9	Sistema de reconocimiento de texto mecanografiado mediante redes neuronales para la gestión de boletas de pago en la Ugel Ferreñafe Bonilla Vilchez, Jonathan Alonso January 2024 (has links) En este proyecto, se llevó a cabo un estudio con el objetivo de desarrollar un sistema de reconocimiento óptico de caracteres (OCR) diseñado para identificar y almacenar la información de las boletas de pago de docentes en la UGEL Ferreñafe. Esto se debió a la necesidad de agilizar la búsqueda de boletas en formato físico, un proceso que, en ocasiones, podía llevar semanas y requerir la contratación de personal adicional. Esta problemática impulsó la búsqueda de una solución eficaz y rentable. Siguiendo las metodologías SCRUM y CRISP-DM, se optó por utilizar Redes Neuronales (RN) como la técnica principal. Esta elección se basó en investigaciones previas y tendencias identificadas en Google Trends. El objetivo fundamental era alcanzar un porcentaje de error bajo en la tasa de caracteres reconocidos, y se logró un hito significativo del 1.8%, a pesar de la degradación de la tinta en muchas boletas debido al paso del tiempo. Para evaluar la usabilidad del sistema, se aplicó la escala SUS (System Usability Scale), y el sistema obtuvo una puntuación de 80, superando las expectativas iniciales. Esto resalta la alta usabilidad y satisfacción de los usuarios finales con la aplicación desarrollada. / In this project, a study was carried out with the objective of developing an optical character recognition (OCR) system designed to identify and store information from teacher pay slips at UGEL Ferreñafe. This was due to the need to expedite the search for physical ballots, a process that could sometimes take weeks and require the hiring of additional staff. This problem prompted the search for an effective and profitable solution. Following the SCRUM and CRISP-DM methodologies, it was decided to use Neural Networks (RN) as the main technique. This choice was based on previous research and trends identified in Google Trends. The fundamental objective was to achieve a low error rate in the rate of recognized characters, and a significant milestone of 1.8% was achieved, despite the degradation of the ink on many ballots due to the passage of time. To evaluate the usability of the system, the SUS scale (System Usability Scale) was applied, and the system obtained a score of 80, exceeding initial expectations. This highlights the high usability and satisfaction of end users with the developed application. Automatización de Procesos Tecnología Educativa Optical Character Recognition (OCR) Process Automation Educational Technology
10	Simultaneous Detection and Validation of Multiple Ingredients on Product Packages: An Automated Approach : Using CNN and OCR Techniques / Simultant detektering och validering av flertal ingredienser på produktförpackningar: Ett automatiserat tillvägagångssätt : Genom användning av CNN och OCR tekniker Farokhynia, Rodbeh, Krikeb, Mokhtar January 2024 (has links) Manual proofreading of product packaging is a time-consuming and uncertain process that can pose significant challenges for companies, such as scalability issues, compliance risks and high costs. This thesis work introduces a novel solution by employing advanced computer vision and machine learning methods to automate the proofreading of multiple ingredients’ lists corresponding to multiple products simultaneously within a product package. By integrating Convolutional Neural Network (CNN) and Optical Character Recognition (OCR) techniques, this study examines the efficacy of automated proofreading in comparison to manual methods. The thesis involves analyzing product package artwork to identify ingredient lists utilizing the YOLOv5 object detection algorithm and the optical character recognition tool EasyOCR for ingredient extraction. Additionally, Python scripts are employed to extract ingredients from corresponding INCI PDF files (document that lists the standardized names of ingredients used in cosmetic products). A comprehensive comparison is then conducted to evaluate the accuracy and efficiency of automated proofreading. The comparison of the extracted ingredients from the product packages and their corresponding INCI PDF files yielded a match of 12.7%. Despite the suboptimal result, insights from the study highlights the limitations of current detection and recognition algorithms when applied to complex artwork. A few examples of the insights have been that the trained YOLOv5 model cuts through sentences in the ingredient list or that EasyOCR cannot extract ingredients from vertically aligned product package images. The findings underscore the need for advancements in detection algorithms and OCR tools to effectively handle objects like product packaging designs. The study also suggests that companies, such as H&M, consider updating their artwork and INCI PDF files to align with the capabilities of current AI-driven tools. By doing so, they can enhance the efficiency and overall effectiveness of automated proofreading processes, thereby reducing errors and improving accuracy. / Manuell korrekturläsning av produktförpackningar är en tidskrävande och osäker process som kan skapa betydande utmaningar för företag, såsom skalbarhetsproblem, efterlevnadsrisker och höga kostnader. Detta examensarbete presenterar en ny lösning genom att använda avancerade metoder inom datorseende och maskininlärning för att automatisera korrekturläsningen av flera ingredienslistor som motsvarar flera produkter samtidigt inom en produktförpackning. Genom att integrera Convolutional Neural Network (CNN) och Optical Character Recognition (OCR) utreder denna studie effektiviteten av automatiserad korrekturläsning i jämförelse med manuella metoder. Avhandlingen analyserar designen av produktförpackningar för att identifiera ingredienslistor med hjälp av objektdetekteringsalgoritmen YOLOv5 och det optiska teckenigenkänningsverktyget EasyOCR för extrahera enskilda ingredienser från listorna. Utöver detta används Python-skript för att extrahera ingredienser från motsvarande INCI-PDF filer (dokument med standardiserade namn på ingredienser som används i kosmetika produkter). En omfattande jämförelse genomförs sedan för att utvärdera noggrannheten och effektiviteten hos automatiserad korrekturläsning. Jämförelsen av de extraherade ingredienserna från produktförpackningarna och deras korresponderande INCI-PDF filer gav ett matchnings resultat på 12.7%. Trots de mindre optimala resultaten belyser studien de begränsningar som finns hos de nuvarande detekterings- och teckenigenkänningsalgoritmerna när de appliceras på komplexa verk av produktförpackningar. Ett fåtal exempel på insikterna är bland annat att den tränade YOLOv5 modellen skär igenom meningar i ingredienslistan eller att EasyOCR inte kan extrahera ingredienser från stående ingredienslistor på produktförpackningsbilder. Resultaten understryker behovet av framsteg inom detekteringsalgoritmer och OCR-verktyg för att effektivt kunna hantera komplexa objekt som produktförpackningar. Studien föreslår även att företag, såsom H&M, överväger att uppdatera sina design av produktförpackningar och INCI-PDF filer för att anpassa sig till kapaciteten hos aktuella AI-drivna verktyg. Genom att utföra detta kan de förbättra både effektiviteten och den övergripande kvaliteten hos de automatiserade korrekturläsningsprocesserna, vilket minskar fel och ökar noggrannheten. Proofreading product packaging automation computer vision machine learning Convolutional Neural Network (CNN) Optical Character Recognition (OCR) YOLOv5 EasyOCR accuracy Manuell korrekturläsning automatisering produktförpackningar datorseende maskininlärning Convolutional Neural Network (CNN) Optical Character Recognition (OCR) YOLOv5 EasyOCR noggrannhet Other Computer and Information Science Annan data- och informationsvetenskap

Search results