• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 80
  • 6
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 123
  • 123
  • 123
  • 55
  • 52
  • 37
  • 28
  • 25
  • 24
  • 23
  • 23
  • 21
  • 20
  • 17
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Benchmarking Object Detection Algorithms for Optical Character Recognition of Odometer Mileage

Hjelm, Mandus, Andersson, Eric January 2022 (has links)
Machine learning algorithms have had breakthroughs in many areas in the last decades. The hardest task, to solve with machine learning, was solving tasks that humans solve intuitively, e.g. understanding natural language or recognizing specific objects in images. To overcome these problems is to allow the computer to learn from experience, instead of implementing a pre-written program to solve the problem at hand - that is how Neural Networks came to be. Neural Network is widely used in image analysis, and object detection algorithms have evolved considerably in the last years. Two of these algorithms are Faster Region-basedConvolutional Neural Networks(Faster R-CNN) and You Only Look Once(YOLO). The purpose of this thesis is to evaluate and benchmark state-of-the-art object detection methods and then analyze their performance based on reading information from images. The information that we aim to extract is digital and analog digits from the odometer of a car, this will be done through object recognition and region-based image analysis. Our models will be compared to the open-source Optical Character Recognition(OCR) model Tesseract, which is in production by the Stockholm-based company Greater Than. In this project we will take a more modern approach and focus on two object detection models, Faster R-CNN and YOLO. When training these models, we will use transfer learning. This means that we will use models that are pre-trained, in our case on a dataset called ImageNet, specifically for object detection. We will then use the TRODO dataset to train these models further, this dataset consists of 2 389 images of car odometers. The models are then evaluated through the measures of mean average precision(mAP), prediction accuracy, and Levenshtein Distance. Our findings are that the object detection models are out-performing Tesseract for all measurements. The highest mAP and accuracy is attained by Faster R-CNN while the best results, regarding Levenshtein distance, are achieved by a YOLO model. The final result is clear, both of our approaches have more diversity and are far better thanTesseract, for solving this specific problem.
102

Analogue meters in a digital world : Minimizing data size when offloading OCR processes

Davidsson, Robin, Sjölander, Fredrik January 2022 (has links)
Introduction: Instead of replacing existing analogue water meters with Internet of Things (IoT) connected substitutes, an alternative would be to attach an IoT connected module to the analogue water meter that optically reads the meter value using Optical Character Recognition (OCR). Such a module would need to be battery-powered given that access to the electrical grid is typically limited near water meters. Research has shown that offloading the OCR process can reduce the power dissipation from the battery, and that this dissipation can be reduced even further by reducing the amount of data that is transmitted.  Purpose: For the sake of minimising energy consumption in the proposed solution, the purpose of the study is to find out to what extent it is possible to reduce an input image’s file size by means of resolution, colour depth, and compression before the Google Cloud Vision OCR engine no longer returns feasible results.   Method and implementation: 250 images of analogue water meter values were processed by the Google Vision Cloud OCR through 38 000 different combinations of resolution, colour depth, and upscaling.  Results: The highest rate of successful OCR readings with a minimal file size were found among images within a range of resolutions between 133 x 22 to 163 x 27 pixels and colour depths between 1- and 2-bits/pixel.  Conclusion: The study shows that there is a potential for minimising data sizes, and thereby energy consumption, by offloading the OCR process by means of transmitting images of minimal file size.
103

Computer Vision for Document Image Analysis and Text Extraction / Datorseende för analys av dokumentbilder och textutvinning

Benchekroun, Omar January 2022 (has links)
Automatic document processing has been a subject of interest in the industry for the past few years, especially with the recent technological advances in Machine Learning and Computer Vision. This project investigates in-depth a major component used in Document Image Processing known as Optical Character Recognition (OCR). First, an improvement upon existing shallow CNN+LSTM is proposed, using domain-specific data synthesis. We demonstrate that this model can achieve an accuracy of up to 97% on non-handwritten text, with an accuracy improvement of 24% when using synthetic data. Furthermore, we deal with handwritten text that presents more challenges including the variance of writing style, slanting, and character ambiguity. A CNN+Transformer architecture is validated to recognize handwriting extracted from real-world insurance statements data. This model achieves a maximal accuracy of 92% on real-world data. Moreover, we demonstrate how a data pipeline relying on synthetic data can be a scalable and affordable solution for modern OCR needs. / Automatisk dokumenthantering har varit ett ämne av intresse i branschen under de senaste åren, särskilt med de senaste tekniska framstegen inom maskininlärning och datorseende. I detta projekt kommer man att på djupet undersöka en viktig komponent som används vid bildbehandling av dokument och som kallas optisk teckenigenkänning (OCR). Först kommer en förbättring av befintlig ytlig CNN+LSTM att föreslås, med hjälp av domänspecifik datasyntes. Vi kommer att visa att denna modell kan uppnå en noggrannhet på upp till 97% på icke handskriven text, med en förbättring av noggrannheten på 24% när syntetiska data används. Dessutom kommer vi att behandla handskriven text som innebär fler utmaningar, t.ex. variationer i skrivstilen, snedställningar och tvetydiga tecken. En CNN+Transformer-arkitektur kommer att valideras för att känna igen handskrift från verkliga data om försäkringsbesked. Denna modell uppnår en maximal noggrannhet på 92% på verkliga data. Dessutom kommer vi att visa hur en datapipeline som bygger på syntetiska data är en skalbar och prisvärd lösning för moderna OCR-behov.
104

Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing.

Al-Muhtaseb, Husni A. January 2010 (has links)
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. / King Fahd University of Petroleum and Minerals (KFUPM)
105

Exploring Machine Learning Solutions in the Context of OCR Post-Processing of Invoices / Utforskning av Maskininlärningslösningar för Optisk Teckenläsningsefterbehandling av Fakturor

Dwyer, Jacob, Bertse, Sara January 2022 (has links)
Large corporations receive and send large volumes of invoices containing various fields detailing a transaction. Such fields include VAT, due date, total amount, etc. One common way to automatize invoice processing is optical character recognition (OCR). This technology entails automatic reading of characters from scanned images. One problem with invoices is that there is no universal layout standard. This creates difficulties when processing data from invoices with different layouts. This thesis aims to examine common errors in the output from Azure's Form Recognizer general document model and the ways in which machine learning (ML) can be used to solve the aforementioned problem, by providing error detection as a first step when classifying OCR output as correct or incorrect. To examine this, an analysis of common errors was made based on OCR output from 70 real invoices, and a Bidirectional Encoder Representations from Transformers (BERT) model was fine-tuned for invoice classification. The results show that the two most common OCR errors are: (i) extra words showing up in a field and (ii) words missing from a field. Together these two types of errors account for 51% of OCR errors. For correctness classification, a BERT type Transformer model yielded an F-score of 0.982 on fabricated data. On real invoice data, the initial model yielded an F-score of 0.596. After additional fine-tuning, the F-score was raised to 0.832. The results of this thesis show that ML, while not entirely reliable, may be a viable first step in assessment and correction of OCR errors for invoices. / Stora företag tar emot och skickar ut stora volymer fakturor innehållande olika fält med transaktionsdetaljer. Dessa fält inkluderar skattesats, förfallodatum, totalbelopp, osv. Ett vanligt sätt att automatisera fakturahantering är optisk teckenläsning. Denna teknologi innebär automatisk läsning av tecken från inskannade bilder. Ett problem med fakturor är att det saknas standardmall. Detta försvårar hanteringen av inläst data från fakturor med olika gränssnitt. Denna uppsats söker utforska vanliga fel i utmatningen från Azure's Form Recognizer general document model och sätten på vilka maskininlärning kan användas för att lösa nämnda problem, genom att förse feldetektering som ett första steg genom att klassificera optisk teckenläsningsutmatning som korrekt eller inkorrekt. För att undersöka detta gjordes en analys av vanligt förkommande fel i teckenläsningsutdata från 70 verkliga fakturor, och en BERT-modell finjusterades för klassificering av fakturor. Resultaten visar att de två vanligast förekommande optiska teckenläsningsfelen är:(i) att ovidkommande ord upptäcks i ett inläst värdefält och (ii) avsaknaden av ord i ett värdefält, vilka svarar för 51% av de optiska teckenläsningsfelen. För korrekthetsklassificeringen användes Transformermodellen BERT vilket gav ett F-värde på 0.98 för fabrikerad data. För data från verkliga fakturor var F-värdet 0.596 för den ursprungliga modellen. Efter ytterligare finjustering hamnade F-värdet på 0.832. Resultaten i denna uppsats visar att maskininlärning, om än inte fullt tillförlitligt, är ett gångbart första steg vid bedömning och korrigering av optiska teckenläsningsfel.
106

Sequence-to-Sequence Learning using Deep Learning for Optical Character Recognition (OCR)

Mishra, Vishal Vijayshankar January 2017 (has links)
No description available.
107

Text Localization for Unmanned Ground Vehicles

Kirchhoff, Allan Richard 16 October 2014 (has links)
Unmanned ground vehicles (UGVs) are increasingly being used for civilian and military applications. Passive sensing, such as visible cameras, are being used for navigation and object detection. An additional object of interest in many environments is text. Text information can supplement the autonomy of unmanned ground vehicles. Text most often appears in the environment in the form of road signs and storefront signs. Road hazard information, unmapped route detours and traffic information are available to human drivers through road signs. Premade road maps lack these traffic details, but with text localization the vehicle could fill the information gaps. Leading text localization algorithms achieve ~60% accuracy; however, practical applications are cited to require at least 80% accuracy [49]. The goal of this thesis is to test existing text localization algorithms against challenging scenes, identify the best candidate and optimize it for scenes a UGV would encounter. Promising text localization methods were tested against a custom dataset created to best represent scenes a UGV would encounter. The dataset includes road signs and storefront signs against complex background. The methods tested were adaptive thresholding, the stroke filter and the stroke width transform. A temporal tracking proof of concept was also tested. It tracked text through a series of frames in order to reduce false positives. Best results were obtained using the stroke width transform with temporal tracking which achieved an accuracy of 79%. That level of performance approaches requirements for use in practical applications. Without temporal tracking the stroke width transform yielded an accuracy of 46%. The runtime was 8.9 seconds per image, which is 44.5 times slower than necessary for real-time object tracking. Converting the MATLAB code to C++ and running the text localization on a GPU could provide the necessary speedup. / Master of Science
108

The impact of different reading/writing media on the education and employment of blind persons

Moodley, Sivalingum 30 June 2004 (has links)
Particularly in recent years, prompted by the need to gain greater independent access to a wider range of information, many persons who are blind make extensive use of screen access technology, optical character recognition devices, refreshable Braille displays and electronic notetakers in a variety of contexts. These reading and writing media have proved to be so useful and effective, raising debates in the literature on whether there is a decline in the use of Braille, or whether Braille as a reading and writing medium would become obsolete. Following a discussion on the development of tactual reading and writing media as part of an historical background to blindness, as well as an evaluation of the various reading and writing media used in South Africa by persons who are blind, this study, using a quantitative approach with a survey design, aimed to determine the impact of the various reading and writing media on the education and employment of persons who are blind. Based on the findings of the study, what emerges forcefully with regard to the preference of a medium for reading or writing is that a greater number of persons who are blind prefer Braille and computers with speech output. Notwithstanding this, there is support for the need to provide instruction in the use of the various reading and writing media, highlighting the critical value and role of the various media. Additionally, while persons who are blind appear to be convinced that computers will not replace Braille, they were, however, divided on whether there is a decline in the use of Braille, and whether computers would replace audiotapes. Finally, conclusions, based mainly on the findings of the study are drawn, and recommendations, both for future research, and for an integrated reading and writing model, are made. / Educational Studies / D.Ed.(Special Needs Educstion)
109

The impact of different reading/writing media on the education and employment of blind persons

Moodley, Sivalingum 30 June 2004 (has links)
Particularly in recent years, prompted by the need to gain greater independent access to a wider range of information, many persons who are blind make extensive use of screen access technology, optical character recognition devices, refreshable Braille displays and electronic notetakers in a variety of contexts. These reading and writing media have proved to be so useful and effective, raising debates in the literature on whether there is a decline in the use of Braille, or whether Braille as a reading and writing medium would become obsolete. Following a discussion on the development of tactual reading and writing media as part of an historical background to blindness, as well as an evaluation of the various reading and writing media used in South Africa by persons who are blind, this study, using a quantitative approach with a survey design, aimed to determine the impact of the various reading and writing media on the education and employment of persons who are blind. Based on the findings of the study, what emerges forcefully with regard to the preference of a medium for reading or writing is that a greater number of persons who are blind prefer Braille and computers with speech output. Notwithstanding this, there is support for the need to provide instruction in the use of the various reading and writing media, highlighting the critical value and role of the various media. Additionally, while persons who are blind appear to be convinced that computers will not replace Braille, they were, however, divided on whether there is a decline in the use of Braille, and whether computers would replace audiotapes. Finally, conclusions, based mainly on the findings of the study are drawn, and recommendations, both for future research, and for an integrated reading and writing model, are made. / Educational Studies / D.Ed.(Special Needs Educstion)
110

Simultaneous Detection and Validation of Multiple Ingredients on Product Packages: An Automated Approach : Using CNN and OCR Techniques / Simultant detektering och validering av flertal ingredienser på produktförpackningar: Ett automatiserat tillvägagångssätt : Genom användning av CNN och OCR tekniker

Farokhynia, Rodbeh, Krikeb, Mokhtar January 2024 (has links)
Manual proofreading of product packaging is a time-consuming and uncertain process that can pose significant challenges for companies, such as scalability issues, compliance risks and high costs. This thesis work introduces a novel solution by employing advanced computer vision and machine learning methods to automate the proofreading of multiple ingredients’ lists corresponding to multiple products simultaneously within a product package. By integrating Convolutional Neural Network (CNN) and Optical Character Recognition (OCR) techniques, this study examines the efficacy of automated proofreading in comparison to manual methods. The thesis involves analyzing product package artwork to identify ingredient lists utilizing the YOLOv5 object detection algorithm and the optical character recognition tool EasyOCR for ingredient extraction. Additionally, Python scripts are employed to extract ingredients from corresponding INCI PDF files (document that lists the standardized names of ingredients used in cosmetic products). A comprehensive comparison is then conducted to evaluate the accuracy and efficiency of automated proofreading. The comparison of the extracted ingredients from the product packages and their corresponding INCI PDF files yielded a match of 12.7%. Despite the suboptimal result, insights from the study highlights the limitations of current detection and recognition algorithms when applied to complex artwork. A few examples of the insights have been that the trained YOLOv5 model cuts through sentences in the ingredient list or that EasyOCR cannot extract ingredients from vertically aligned product package images. The findings underscore the need for advancements in detection algorithms and OCR tools to effectively handle objects like product packaging designs. The study also suggests that companies, such as H&M, consider updating their artwork and INCI PDF files to align with the capabilities of current AI-driven tools. By doing so, they can enhance the efficiency and overall effectiveness of automated proofreading processes, thereby reducing errors and improving accuracy. / Manuell korrekturläsning av produktförpackningar är en tidskrävande och osäker process som kan skapa betydande utmaningar för företag, såsom skalbarhetsproblem, efterlevnadsrisker och höga kostnader. Detta examensarbete presenterar en ny lösning genom att använda avancerade metoder inom datorseende och maskininlärning för att automatisera korrekturläsningen av flera ingredienslistor som motsvarar flera produkter samtidigt inom en produktförpackning. Genom att integrera Convolutional Neural Network (CNN) och Optical Character Recognition (OCR) utreder denna studie effektiviteten av automatiserad korrekturläsning i jämförelse med manuella metoder. Avhandlingen analyserar designen av produktförpackningar för att identifiera ingredienslistor med hjälp av objektdetekteringsalgoritmen YOLOv5 och det optiska teckenigenkänningsverktyget EasyOCR för extrahera enskilda ingredienser från listorna. Utöver detta används Python-skript för att extrahera ingredienser från motsvarande INCI-PDF filer (dokument med standardiserade namn på ingredienser som används i kosmetika produkter). En omfattande jämförelse genomförs sedan för att utvärdera noggrannheten och effektiviteten hos automatiserad korrekturläsning. Jämförelsen av de extraherade ingredienserna från produktförpackningarna och deras korresponderande INCI-PDF filer gav ett matchnings resultat på 12.7%. Trots de mindre optimala resultaten belyser studien de begränsningar som finns hos de nuvarande detekterings- och teckenigenkänningsalgoritmerna när de appliceras på komplexa verk av produktförpackningar. Ett fåtal exempel på insikterna är bland annat att den tränade YOLOv5 modellen skär igenom meningar i ingredienslistan eller att EasyOCR inte kan extrahera ingredienser från stående ingredienslistor på produktförpackningsbilder. Resultaten understryker behovet av framsteg inom detekteringsalgoritmer och OCR-verktyg för att effektivt kunna hantera komplexa objekt som produktförpackningar. Studien föreslår även att företag, såsom H&M, överväger att uppdatera sina design av produktförpackningar och INCI-PDF filer för att anpassa sig till kapaciteten hos aktuella AI-drivna verktyg. Genom att utföra detta kan de förbättra både effektiviteten och den övergripande kvaliteten hos de automatiserade korrekturläsningsprocesserna, vilket minskar fel och ökar noggrannheten.

Page generated in 0.1294 seconds