Global ETD Search

11	Algorithms for the recognition of poor quality documents Raza, Ghulam January 1998 (has links) No description available. 621.3994 OCR; Word recognition; Segmentation
12	Test av OCR-verktyg för Linux / OCR software tests for Linux Nilsson, Elin January 2010 (has links) Denna rapport handlar om att ta fram ett OCR-verktyg för digitalisering av pappersdokument. Krav på detta verktyg är att bland annat det ska vara kompatibelt med Linux, det ska kunna ta kommandon via kommandoprompt och dessutom ska det kunna hantera skandinaviska tecken. Tolv OCR-verktyg granskades, sedan valdes tre verktyg ut; Ocrad, Tesseract och OCR Shop XTR. För att testa dessa scannades två dokument in och digitaliserades i varje verktyg. Resultatet av testerna är att Tesseract är de verktyget som är mest precist och Ocrad är det verktyget som är snabbast. OCR Shop XTR visar på sämst resultat både i tidtagning och i antal korrekta ord. / This report is about finding OCR software for digitizing paper documents. Requirements were to include those which were compatible with Linux, being able to run commands via the command line and also being able to handle the Scandinavian characters. Twelve OCR softwares were reviewed, and three softwares were chosen; Ocrad, Tesseract and OCR Shop XTR. To test these, two document were scanned and digitized in each tool. The results of the tests are that Tesseract is the tool which is the most precise and Ocrad is the tool which is the fastest. OCR Shop XTR shows the worst results both in timing and number of correct words. OCR Linux digitizing Tesseract Ocrad OCR Shop XTR OCR Linux digitalisering Tesseract Ocrad OCR Shop XTR Computer Sciences Datavetenskap (datalogi)
13	Classifying Receipts and Invoices in Visma Mobile Scanner Yasser, Almodhi January 2016 (has links) This paper presents a study on classifying receipts and invoices using Machine Learning. Furthermore, Naïve Bayes Algorithm and the advantages of using it will be discussed. With information gathered from theory and previous research, I will show how to classify images into a receipt or an invoice. Also, it includes pre-processing images using a variety of pre-processing methods and text extraction using Optical Character Recognition (OCR). Moreover, the necessity of pre-processing images to reach a higher accuracy will be discussed. A result shows a comparison between Tesseract OCR engine and FineReader OCR engine. After embracing much knowledge from theory and discussion, the results showed that combining FineReader OCR engine and Machine Learning is increasing the accuracy of the image classification. Machine Learning classifying OCR Tesseract Fine Reader
14	Mapping charge to function relationships of the DNA mimic protein Ocr Kanwar, Nisha January 2014 (has links) This thesis investigates the functional consequences of neutralising the negative charges on the bacteriophage T7 antirestriction protein ocr. The ocr molecule is a small highly negatively charged, protein homodimer that mimics a short DNA duplex upon binding to the Type I Restriction Modification (RM) system. Thus, ocr facilitates phage infection by binding to and inactivating the host RM system. The aim of this study was to analyse the effect of reducing the negative charge on the ocr molecule by mutating the acidic residues of the protein. The ocr molecule (117 residues) is replete with Asp and Glu residues; each monomer of the homodimer contains 34 acidic residues. Our strategy was to begin with a synthetic gene in which all the acidic residues of ocr had been neutralised. This so called ‘positive ocr’ (or pocr) was used as a template to gradually reintroduce codons for acidic residues by adapting the ISOR strategy proposed by D.S.Tawfik. After each round of mutagenesis an average of 4-6 acidic residues were incorporated into pocr. In this fashion a series of mutant libraries in which acidic residues were progressively introduced into pocr was generated. A high-throughput in vivo selection assay was developed and validated by assessing the antirestriction behaviour of a number of mutants of the DNA mimic proteins wtOcr and Orf18 ArdA. Further to this, selective screening of the libraries allowed us to select clones that displayed antirestriction activity. These mutants were purified and in vitro characterisation confirmed these mutants as displaying the minimum number of acidic residues deemed critical for the activity of ocr. This in vitro process effectively simulated the evolution of the charge mimicry of ocr. Moreover, we were able to tune the high-throughput assay to different selection criteria in order to elucidate various levels of functionality and unexpected changes in phenotype. This approach enables us to map the “in vitro” evolution of ocr to identify acidic residues that are required for protein expression, solubility and function proceeding to a fully functional antirestriction protein. 572.8
15	Developing Optical Character Recoginition for Ethiopic Scripts Demissie, Fitsum January 2011 (has links) The Amharic language is the Official language of over 70 million people mainly in Ethiopia. An extensive literature survey and the government report reveal no single Amharic character recognition is found in the country. The Amharic script has 33 basic characters each with seven orders giving 310 distinct characters, including numbers and punctuation symbols. The characters are visually similar; there is a typeface, but no capitalization. Beside this there is no any standard font to use the language in the computer but they use different fonts developed by different stakeholders without keeping a standard on their own way and interest and this create a problem of incompatibility between different fonts and documents.This project is to investigate the reason why Amharic optical character recognition is not addressed by local and international researchers and developers and finally to develop Amharic optical character recognition uses the features and facilities of Microsoft windows Vista or 7 using Unicode standard. Ethiopic Geez Amharic SVM OCR Latin Non-Latin.
16	Proposta de arquitetura de um sistema com base em OCR neuronal para resgate e indexação de escritas paleográficas do sec. XVI ao XIX / Proposal of an system architeture based on neural OCR for rescue and index paleography writens between XVI and XIX centuries Mendonça, Fábio Lúcio Lopes de 27 June 2008 (has links) Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2008. / Submitted by Jaqueline Oliveira (jaqueoliveiram@gmail.com) on 2008-11-21T11:33:16Z No. of bitstreams: 1 DISSERTACAO_2008_FabioLucioLMendonca.pdf: 1577853 bytes, checksum: f9d0f561e4281eac3a1f4808f3e239ea (MD5) / Approved for entry into archive by Georgia Fernandes(georgia@bce.unb.br) on 2009-02-06T10:39:13Z (GMT) No. of bitstreams: 1 DISSERTACAO_2008_FabioLucioLMendonca.pdf: 1577853 bytes, checksum: f9d0f561e4281eac3a1f4808f3e239ea (MD5) / Made available in DSpace on 2009-02-06T10:39:13Z (GMT). No. of bitstreams: 1 DISSERTACAO_2008_FabioLucioLMendonca.pdf: 1577853 bytes, checksum: f9d0f561e4281eac3a1f4808f3e239ea (MD5) / Este trabalho objetiva propor uma arquitetura de um sistema para tratamento e reconhecimento automático do texto de documentos paleográficos, utilizando um OCR (Optical Character Recognition) com tecnologia de redes neurais artificiais. O sistema proposto deve atuar no contexto de processos de transcrição do texto de documentos de escritas paleográficas do século XVI ao XIX, documentos estes do Brasil colônia que foram digitalizados a partir dos originais impressos arquivados no Arquivo Ultramarino de Lisboa, uma das realizações do Projeto Resgate do Ministério da Cultura brasileiro. A arquitetura do sistema proposto inclui módulos para segmentar as imagens digitalizadas dos documentos, para análise dos segmentos com OCR na tentativa de reconhecimento do texto, para treinamento do OCR com formação de um dicionário de palavras reconhecidas e para armazenamento do texto transcrito a partir das imagens dos documentos. Para avaliar essa arquitetura foi desenvolvido um protótipo de software que permite ao usuário segmentar manualmente uma imagem de documento, treinar um OCR simples e extrair com esse OCR algumas informações de texto do documento paleográfico digitalizado. Conclui-se que a arquitetura proposta é funcional, ainda que sejam necessários desenvolvimentos mais profundos no que se refere aos processos de segmentação dos documentos e reconhecimento das escritas paleográficas do século XVI ao XIX. ___________________________________________________________________________________________ ABSTRACT / This work propose a system architecture for automatic manipulate and recognize of text on paleographic document, using Optical Character Recognition (OCR) aggregate with artificial neural networks. The system should work on the context of process text transcription on text documents with paleographic writing of century XVI to XIX; those documents are acquired from Brazil on colony age and digitalized from the original files archived on Ultramario Archive of Lisboa, one works of Projeto Resgate from Brazilian Culture Ministry. The architecture of propose system has modules for segment the digital image of documents, analyze of segments with OCR in try of text recognize, OCR training for compose a dictionary of recognized worlds and also a module for storage the transcript text from document images. For evaluation has been developed prototype software, where one user could manually segment a document image, simple OCR training and using this OCR gets some text information from a digital paleographic document. We conclude that the propose architecture was functional, but still need more improvements on document segmentation module and on module that recognize the paleographic writings of century XVI to XIX. Optical character recognition OCR Paleografia Inteligência artificial
17	Análise e Classificação de imagens para aplicação de OCR em cupons fiscais Feijó, José Victor Feijó de Araujo 13 December 2017 (has links) TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação. / Submitted by José Victor Feijo de Araujo null (victor.feijo@ufsc.br) on 2017-12-12T00:28:08Z No. of bitstreams: 1 TCC_JOSE_VICTOR_FEIJÓ.pdf: 18256303 bytes, checksum: 6f566a4daec3603fa7cc31bf1d8da5c8 (MD5) / Approved for entry into archive by Renato Cislaghi (renato.cislaghi@ufsc.br) on 2017-12-13T21:10:36Z (GMT) No. of bitstreams: 1 TCC_JOSE_VICTOR_FEIJÓ.pdf: 18256303 bytes, checksum: 6f566a4daec3603fa7cc31bf1d8da5c8 (MD5) / Made available in DSpace on 2017-12-13T21:10:36Z (GMT). No. of bitstreams: 1 TCC_JOSE_VICTOR_FEIJÓ.pdf: 18256303 bytes, checksum: 6f566a4daec3603fa7cc31bf1d8da5c8 (MD5) / A proposta sugerida por este trabalho foi de analisar o impacto de um modelo de classificação, seguido de técnicas de PDI e OCR para extração de texto em cupons fiscais, classificando-os em subgrupos. Técnicas selecionadas de PDI foram aplicadas para cada grupo com suas devidas características, por fim extraindo texto dessas imagens através de um algoritmo de OCR. Foi realizado um estudo sobre os algoritmos clássicos de classificação na área de aprendizado de máquinas, com foco nos algoritmos de “clusterização” e sua correlação com a classificação de imagens em um modelo de aprendizado não supervisionado. Também foi feita uma análise sobre as características das imagens de cupons fiscais e das possíveis técnicas de PDI que podem ser aplicadas. Em relação ao OCR, também foi realizado um estudo para verificar possíveis soluções na extração de texto e entender seu comportamento, possibilitando desta maneira implementar a arquitetura proposta. Sendo assim, foram desenvolvidos métodos para classificar as imagens em clusters utilizando algoritmos de “clusterização”. Também foram propostas três técnicas de PDI, a primeira aplicando uma série de realces, a segunda uma binarização adaptativa e a terceira técnica utilizando a compressão de dados JPEG. Essas imagens foram enviadas para o serviço de OCR do Google Vision, onde foi possível extrair o texto das imagens em formato de blocos. Os resultados do modelo desenvolvido foram avaliados comparando a taxa de acerto do OCR com os valores de texto reais presentes nos cupons fiscais, onde foi possível analisar a precisão de cada técnica proposta e da arquitetura como um todo. Foram obtidos resultados positivos utilizando o modelo desenvolvido, melhorando a extração do valor total da compra em aproximadamente 6%. Além disso, os resultados da compressão JPEG melhoraram também a extração de outros dados do cupom fiscal, como por exemplo o CNPJ e a data da compra.
18	Optické snímání a analýza bytových měřidel / Optical Recognition and Analysis for Home Metering Machala, Petr January 2015 (has links) The topic of this master’s thesis is a study and solution of problems regarding an optical recognition and home metering analysis for the following recording and statistical processing. For this required application a STMicroelectronics 32F429IDISCOVERY development board with ARM Cortex-M4 microcontroller and OmniVision OV7670 camera module was used, where a required firmware was implemented. This thesis therefore generally represents a complex solution from acquisition of images by used camera module to final analysis and data processing and building a functional prototype of the proposed sensor system.
19	Algoritmy detekce obchodních dokumentů podle šablon / Algorithms for business document detection using templates Michalko, Jakub January 2016 (has links) Thesis deals with analysis and design system for automatic document recognition. The system examines the document and converts it into text data, and shall be preserved information about the initial position of the word in the original document. These data will then be reviewed and some of them will be assigned their importance. The way the data will be assigned is based on rules which may vary according to user needs. According to the data, their assignment and the importance of their position, the system finds a similar document and, if it identifies the current document examined. Powered by TCPDF (www.tcpdf.org)
20	Um estudo comparativo de métodos de segmentação de documentos antigos / A comparative study of segmentation methods of historical documents Yanque, Nury Yuleny Arosquipa 29 November 2018 (has links) Há uma vasta quantidade de informação nos textos antigos manuscritos e tipografados, e grandes esforços para a digitalização e disponibilização desses documentos têm sido feitos nos últimos anos. No entanto, os sistemas de Reconhecimento Óptico de Caracteres (OCR) não têm grande sucesso nesses documentos por diversas razões, por exemplo, devido a defeitos por envelhecimento do papel, manchas, iluminação desigual, dobras, escrita do verso transparecendo na frente, pouco contraste entre texto e fundo, entre outros. Uma das etapas importantes para o sucesso de um OCR é a boa segmentação da parte escrita e do fundo da imagem (binarização) e essa etapa é particularmente sensível a esses efeitos que são próprios de documentos históricos. Tanto assim que nos últimos oito anos foram realizadas competições de métodos de binarização de documentos históricos que levaram ao avanço do estado da arte na área. Neste trabalho fizemos um estudo comparativo de diversos métodos de segmentação de documentos antigos e propusemos um método baseado em aprendizado de máquina que resgata as vantagens dos métodos heurísticos. Esse estudo abrangeu documentos históricos manuscritos e tipografados e foi comparado com os métodos do estado da arte via métricas usuais e via um sistema de OCR de código aberto. Os resultados obtidos pelo método proposto são comparáveis com os métodos do estado da arte respeito no resultado do OCR, mostrando algumas vantagens em imagens específicas. / There is a vast amount of information in the ancient handwritten and machine-printed texts, and great efforts for the digitization and availability of these documents have been made in recent years. However, Optical Character Recognition (OCR) systems do not have much success in these documents for a variety of reasons, for example, due to paper aging defects, faded ink, stains, uneven lighting, folds, bleed-through, gosthing, poor contrast between text and background, among others. One of the important steps for the success of an OCR system is the good segmentation of the written part and the background of the image (binarization) and this step is particularly sensitive to those defects that are typical of historical documents. So much so that in the last eight years a competition for the binarization methods of historical documents have been held which led to the advance of the state of the art in the area. In this work we have done a comparative study of several methods of segmentation of historical documents and propose a method based on machine learning that rescues the advantages of the heuristic methods. This study covered both handwritten and typography historical documents and was compared to state-of-the-art methods via DIBCO standard metrics and via an open source OCR system. The results obtained by the proposed method are comparable with the methods of the state of the art respect in the OCR result, showing some advantages in specific images. Binarização de imagens Documentos históricos Documents segmentation Historical documents Image binarization Limiarização OCR OCR Segmentação de documentos Thresholding

Search results