Global ETD Search

31	Advances in Document Layout Analysis Bosch Campos, Vicente 05 March 2020 (has links) [EN] Handwritten Text Segmentation (HTS) is a task within the Document Layout Analysis field that aims to detect and extract the different page regions of interest found in handwritten documents. HTS remains an active topic, that has gained importance with the years, due to the increasing demand to provide textual access to the myriads of handwritten document collections held by archives and libraries. This thesis considers HTS as a task that must be tackled in two specialized phases: detection and extraction. We see the detection phase fundamentally as a recognition problem that yields the vertical positions of each region of interest as a by-product. The extraction phase consists in calculating the best contour coordinates of the region using the position information provided by the detection phase. Our proposed detection approach allows us to attack both higher level regions: paragraphs, diagrams, etc., and lower level regions like text lines. In the case of text line detection we model the problem to ensure that the system's yielded vertical position approximates the fictitious line that connects the lower part of the grapheme bodies in a text line, commonly known as the baseline. One of the main contributions of this thesis, is that the proposed modelling approach allows us to include prior information regarding the layout of the documents being processed. This is performed via a Vertical Layout Model (VLM). We develop a Hidden Markov Model (HMM) based framework to tackle both region detection and classification as an integrated task and study the performance and ease of use of the proposed approach in many corpora. We review the modelling simplicity of our approach to process regions at different levels of information: text lines, paragraphs, titles, etc. We study the impact of adding deterministic and/or probabilistic prior information and restrictions via the VLM that our approach provides. Having a separate phase that accurately yields the detection position (base- lines in the case of text lines) of each region greatly simplifies the problem that must be tackled during the extraction phase. In this thesis we propose to use a distance map that takes into consideration the grey-scale information in the image. This allows us to yield extraction frontiers which are equidistant to the adjacent text regions. We study how our approach escalates its accuracy proportionally to the quality of the provided detection vertical position. Our extraction approach gives near perfect results when human reviewed baselines are provided. / [ES] La Segmentación de Texto Manuscrito (STM) es una tarea dentro del campo de investigación de Análisis de Estructura de Documentos (AED) que tiene como objetivo detectar y extraer las diferentes regiones de interés de las páginas que se encuentran en documentos manuscritos. La STM es un tema de investigación activo que ha ganado importancia con los años debido a la creciente demanda de proporcionar acceso textual a las miles de colecciones de documentos manuscritos que se conservan en archivos y bibliotecas. Esta tesis entiende la STM como una tarea que debe ser abordada en dos fases especializadas: detección y extracción. Consideramos que la fase de detección es, fundamentalmente, un problema de clasificación cuyo subproducto son las posiciones verticales de cada región de interés. Por su parte, la fase de extracción consiste en calcular las mejores coordenadas de contorno de la región utilizando la información de posición proporcionada por la fase de detección. Nuestro enfoque de detección nos permite atacar tanto regiones de alto nivel (párrafos, diagramas¿) como regiones de nivel bajo (líneas de texto principalmente). En el caso de la detección de líneas de texto, modelamos el problema para asegurar que la posición vertical estimada por el sistema se aproxime a la línea ficticia que conecta la parte inferior de los cuerpos de los grafemas en una línea de texto, comúnmente conocida como línea base. Una de las principales aportaciones de esta tesis es que el enfoque de modelización propuesto nos permite incluir información conocida a priori sobre la disposición de los documentos que se están procesando. Esto se realiza mediante un Modelo de Estructura Vertical (MEV). Desarrollamos un marco de trabajo basado en los Modelos Ocultos de Markov (MOM) para abordar tanto la detección de regiones como su clasificación de forma integrada, así como para estudiar el rendimiento y la facilidad de uso del enfoque propuesto en numerosos corpus. Así mismo, revisamos la simplicidad del modelado de nuestro enfoque para procesar regiones en diferentes niveles de información: líneas de texto, párrafos, títulos, etc. Finalmente, estudiamos el impacto de añadir información y restricciones previas deterministas o probabilistas a través de el MEV propuesto que nuestro enfoque proporciona. Disponer de un método independiente que obtiene con precisión la posición de cada región detectada (líneas base en el caso de las líneas de texto) simplifica enormemente el problema que debe abordarse durante la fase de extracción. En esta tesis proponemos utilizar un mapa de distancias que tiene en cuenta la información de escala de grises de la imagen. Esto nos permite obtener fronteras de extracción que son equidistantes a las regiones de texto adyacentes. Estudiamos como nuestro enfoque aumenta su precisión de manera proporcional a la calidad de la detección y descubrimos que da resultados casi perfectos cuando se le proporcionan líneas de base revisadas por humanos. / [CAT] La Segmentació de Text Manuscrit (STM) és una tasca dins del camp d'investigació d'Anàlisi d'Estructura de Documents (AED) que té com a objectiu detectar I extraure les diferents regions d'interès de les pàgines que es troben en documents manuscrits. La STM és un tema d'investigació actiu que ha guanyat importància amb els anys a causa de la creixent demanda per proporcionar accés textual als milers de col·leccions de documents manuscrits que es conserven en arxius i biblioteques. Aquesta tesi entén la STM com una tasca que ha de ser abordada en dues fases especialitzades: detecció i extracció. Considerem que la fase de detecció és, fonamentalment, un problema de classificació el subproducte de la qual són les posicions verticals de cada regió d'interès. Per la seva part, la fase d'extracció consisteix a calcular les millors coordenades de contorn de la regió utilitzant la informació de posició proporcionada per la fase de detecció. El nostre enfocament de detecció ens permet atacar tant regions d'alt nivell (paràgrafs, diagrames ...) com regions de nivell baix (línies de text principalment). En el cas de la detecció de línies de text, modelem el problema per a assegurar que la posició vertical estimada pel sistema s'aproximi a la línia fictícia que connecta la part inferior dels cossos dels grafemes en una línia de text, comunament coneguda com a línia base. Una de les principals aportacions d'aquesta tesi és que l'enfocament de modelització proposat ens permet incloure informació coneguda a priori sobre la disposició dels documents que s'estan processant. Això es realitza mitjançant un Model d'Estructura Vertical (MEV). Desenvolupem un marc de treball basat en els Models Ocults de Markov (MOM) per a abordar tant la detecció de regions com la seva classificació de forma integrada, així com per a estudiar el rendiment i la facilitat d'ús de l'enfocament proposat en nombrosos corpus. Així mateix, revisem la simplicitat del modelatge del nostre enfocament per a processar regions en diferents nivells d'informació: línies de text, paràgrafs, títols, etc. Finalment, estudiem l'impacte d'afegir informació i restriccions prèvies deterministes o probabilistes a través del MEV que el nostre mètode proporciona. Disposar d'un mètode independent que obté amb precisió la posició de cada regió detectada (línies base en el cas de les línies de text) simplifica enormement el problema que ha d'abordar-se durant la fase d'extracció. En aquesta tesi proposem utilitzar un mapa de distàncies que té en compte la informació d'escala de grisos de la imatge. Això ens permet obtenir fronteres d'extracció que són equidistants de les regions de text adjacents. Estudiem com el nostre enfocament augmenta la seva precisió de manera proporcional a la qualitat de la detecció i descobrim que dona resultats quasi perfectes quan se li proporcionen línies de base revisades per humans. / Bosch Campos, V. (2020). Advances in Document Layout Analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/138397 / TESIS Document Layout Analysis Handwritten Text Recognition Document Understanding Pattern Recognition Historical Manuscripts Vertical Layout Models Hidden Markov Model Artificial Neural Network LENGUAJES Y SISTEMAS INFORMATICOS ESTADISTICA E INVESTIGACION OPERATIVA
32	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers Magnifico, Giacomo January 2021 (has links) Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focus on historical ciphers (i.e. encrypted documents from the past with various types of symbol sets), this thesis examines the performance of two machine learning architectures developed within the DECRYPT project framework, a clustering based unsupervised algorithm and a semi-supervised few-shot deep-learning model. Both models are tested on seen and unseen scribes to evaluate the difference in performance and the shortcomings of the two architectures, with the secondary goal of determining the influences of the datasets on the performance. An in-depth analysis of the transcription results is performed with particular focus on the Alchemic and Zodiac symbol sets, with analysis of the model performance relative to character shape and size. The results show the promising performance of Few-Shot architectures when compared to Clustering algorithm, with a respective SER average of 0.336 (0.15 and 0.104 on seen data / 0.754 on unseen data) and 0.596 (0.638 and 0.350 on seen data / 0.8 on unseen data). Image Recognition Handwritten Text Recognition HTR Deep-learning K-mean clustering NN Neural Network Few-Shot
33	Aktivní učení pro rozpoznávání textu / Active Learning for OCR Kohút, Jan January 2019 (has links) The aim of this Master's thesis is to design methods of active learning and to experiment with datasets of historical documents. A large and diverse dataset IMPACT of more than one million lines is used for experiments. I am using neural networks to check the readability of lines and correctness of their annotations. Firstly, I compare architectures of convolutional and recurrent neural networks with bidirectional LSTM layer. Next, I study different ways of learning neural networks using methods of active learning. Mainly I use active learning to adapt neural networks to documents that the neural networks do not have in the original training dataset. Active learning is thus used for picking appropriate adaptation data. Convolutional neural networks achieve 98.6\% accuracy, recurrent neural networks achieve 99.5\% accuracy. Active learning decreases error by 26\% compared to random pick of adaptations data.
34	Detekce vad s využitím smart kamery / Defect detection using smart camera Hons, Viktor January 2021 (has links) This thesis deals with the application of smart cameras and verification of its functions. In the first part the term smart camera is defined, the parts of it and the most common applications are presented. A review of smart cameras from the different manufactures on the market is made. After selection of the proper camera model three task from real industrial application are specified – inspection of capacitor print, inspection of beer label and dimension measurement. With the picked camera the tasks are performed, including the layout of workplace, scene and lighting. Further the reliability is tested together with the successfulness and the speed of designed solution.
35	Analýza sociálních sítí využitím metod rozpoznání vzoru / Social Network Analysis using methods of pattern recognition Križan, Viliam January 2015 (has links) Diplomová práca sa zaoberá rozpoznávaním emócií z textu v sociálnych sieťach. Práca popisuje súčasné metódy extrakcie príznakov, používané lexikóny, korpusy a klasifikátory. Emócie boli rozpoznávané na základe klasifikátoru, netrénovaného na anotovaných dátach z mikroblogovacej siete Twitter. Výhodou použitia služby Twitter, bolo geografické vymedzenie dát, ktoré umožňuje sledovanie zmien emócií populácie v rôznych mestách. Prvým prístupom klasifikácie bolo vytvorenie Baseline algoritmu, ktorý používal jednoduchý lexikón. Pre zlepšenie klasifikácie sme v druhom bode použili komplexnejší SVM klasifikátor. SVM klasifikátory, extrakcie a selekcie príznakov boli použité z dostupnej Python knižnice Scikit. Dáta pre natrénovanie klasifikátoru boli zhromažďované z oblasti USA, a to s pomocou vytvorenej aplikácie. Klasifikátor bol natrénovaný na dátach, označených pri ich zhromažďovaní - bez manuálnej anotácie. Boli použité dve rôzne implantácie SVM klasifikátorov. Výsledné klasifikované emócie, v rôznych mestách a dňoch, boli zobrazené v podobe farebných značiek na mape.
36	OCR of hand-written transcriptions of hieroglyphic text Nederhof, Mark-Jan January 2016 (has links) Encoding hieroglyphic texts is time-consuming. If a text already exists as hand-written transcription, there is an alternative, namely OCR. Off-the-shelf OCR systems seem difficult to adapt to the peculiarities of Ancient Egyptian. Presented is a proof-of-concept tool that was designed to digitize texts of Urkunden IV in the hand-writing of Kurt Sethe. It automatically recognizes signs and produces a normalized encoding, suitable for storage in a database, or for printing on a screen or on paper, requiring little manual correction. The encoding of hieroglyphic text is RES (Revised Encoding Scheme) rather than (common dialects of) MdC (Manuel de Codage). Earlier papers argued against MdC and in favour of RES for corpus development. Arguments in favour of RES include longevity of the encoding, as its semantics are font-independent. The present study provides evidence that RES is also much preferable to MdC in the context of OCR. With a well-understood parsing technique, relative positioning of scanned signs can be straightforwardly mapped to suitable primitives of the encoding. info:eu-repo/classification/ddc/930 ddc:930
37	Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing. Al-Muhtaseb, Husni A. January 2010 (has links) Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. / King Fahd University of Petroleum and Minerals (KFUPM) Arabic text recognition Hidden Markov Models Feature extraction Omni font recognition Minimal Arabic script Bigram Statistical Language Model Optical character recognition (OCR) Statistical and analytical analysis
38	Study of augmentations on historical manuscripts using TrOCR Meoded, Erez 08 December 2023 (has links) (PDF) Historical manuscripts are an essential source of original content. For many reasons, it is hard to recognize these manuscripts as text. This thesis used a state-of-the-art Handwritten Text Recognizer, TrOCR, to recognize a 16th-century manuscript. TrOCR uses a vision transformer to encode the input images and a language transformer to decode them back to text. We showed that carefully preprocessed images and designed augmentations can improve the performance of TrOCR. We suggest an ensemble of augmented models to achieve an even better performance. TrOCR Transformer Ensemble learning OCR Handwritten Text Recognition Deep Neural Networks Machine Learning Artificial Intelligence Huggingface Python Artificial Intelligence and Robotics Data Science
39	Разработка системы автоматического распознавания автомобильных номеров в реальных дорожных условиях : магистерская диссертация / Development of a system for automatic recognition of license plates in real road conditions Зайкис, Д. В., Zaikis, D. V. January 2023 (has links) Цель работы – разработка автоматической системы распознавания номерных знаков автомобилей, в естественных дорожных условиях, в том числе в сложных погодных и физических условиях, таких как недостаточная видимость, загрязнение, умышленное или непреднамеренное частичное скрытие символов. Объектом исследования являются цифровые изображения автомобилей в естественной среде. Методы исследования: сверточные нейронные сети, в том числе одноэтапные детекторы (SSOD), комбинации сетей с промежуточными связями между слоями - Cross Stage Partial Network (CSPNet) и сети, объединяющей информацию с разных уровней сети – Path Aggregation Network (PANet), преобразования изображений с помощью библиотеки OpenCV, включая фильтры Собеля и Гауса, преобразование Кэнни, методы глубокого машинного обучения для обработки последовательностей LSTM, CRNN, CRAFT. В рамках данной работы разработана система распознавания автомобильных номеров, переводящая графические данные из цифрового изображения или видеопотока в текст в виде файлов различных форматов. Задача детекции автомобильных номеров на изображениях решена с помощью глубокой нейронной сети YoLo v5, представляющая собой современную модель обнаружения объектов, основанную на архитектуре с использованием CSPNet и PANet. Она обеспечивает высокую скорость и точность при обнаружении объектов на изображениях. Благодаря своей эффективности и масштабируемости, YoLov5 стала популярным выбором для решения задач компьютерного зрения в различных областях. Для решения задачи распознавания текса на обнаруженных объектах используется алгоритм детектирования объектов, основанный на преобразованиях Кэнни, фильтрах Собеля и Гаусса и нейронная сеть keras-ocr, на основе фреймворка keras, представляющая собой комбинацию сверточной нейронной сети (CNN) и рекуррентной нейронной сети (RNN), решающая задачу распознавания печатного текста. Созданный метод способен безошибочно распознавать 85 % предоставленных номеров, преимущественно российского стандарта. Полученный функционал может быть внедрен в существующую системы фото- или видео-фиксации трафика и использоваться в рамках цифровизации систем трекинга и контроля доступа и безопасности на дорогах и объектах транспортной инфраструктуры. Выпускная квалификационная работа в теоретической и описательной части выполнена в текстовом редакторе Microsoft Word и представлена в электронном формате. Практическая часть выполнялась в jupiter-ноутбуке на платформе облачных вычислений Google Collaboratory. / The goal of the work is to develop an automatic system for recognizing car license plates in natural road conditions, including difficult weather and physical conditions, such as insufficient visibility, pollution, intentional or unintentional partial hiding of symbols. The object of the study is digital images of cars in their natural environment. Research methods: convolutional neural networks, including single-stage detectors (SSOD), combinations of networks with intermediate connections between layers - Cross Stage Partial Network (CSPNet) and networks that combine information from different levels of the network - Path Aggregation Network (PANet), image transformations using the OpenCV library, including Sobel and Gauss filters, Canny transform, deep machine learning methods for processing LSTM, CRNN, CRAFT sequences. As part of this work, a license plate recognition system has been developed that converts graphic data from a digital image or video stream into text in the form of files in various formats. The problem of detecting license plates in images is solved using the YoLo v5 deep neural network, which is a modern object detection model based on an architecture using CSPNet and PANet. It provides high speed and accuracy in detecting objects in images. Due to its efficiency and scalability, YoLov5 has become a popular choice for solving computer vision problems in various fields. To solve the problem of text recognition on detected objects, an object detection algorithm is used, based on Canny transforms, Sobel and Gaussian filters, and the keras-ocr neural network, based on the keras framework, which is a combination of a convolutional neural network (CNN) and a recurrent neural network (RNN) , which solves the problem of recognizing printed text. The created method is capable of accurately recognizing 85% of the provided numbers, mainly of the Russian standard. The resulting functionality can be implemented into existing systems for photo or video recording of traffic and used as part of the digitalization of tracking systems and access control and security on roads and transport infrastructure facilities. The final qualifying work in the theoretical and descriptive parts was completed in the text editor Microsoft Word and presented in electronic format. The practical part was carried out on a jupiter laptop on the Google Collaboratory cloud computing platform. СЕГМЕНТАЦИЯ ТЕКСТА MASTER'S THESIS OBJECT AND PRINTED TEXT RECOGNITION TEXT SEGMENTATION
40	Mobilní systém pro rozpoznání textu na Androidu / Mobile System for Text Recognition on Android Tomešek, Jan January 2017 (has links) This thesis deals with creation of a mobile library for preprocessing of images with text which represents a part of a system for text recognition. The library is realized with emphasis on generality of use, efficiency and portability. The library providing a set of algorithms primarily for image quality assessment and text detection was created in this thesis. These algorithms enable a substantial decrease in volume of transmitted data and speed up and refinement of the recognition process. An example application for the Android platform able to analyze composition of foods stated on their wrappings was created as well. Overall, the library (system) simplifies development of mobile applications with focus on text extraction and analysis. The mobile application then provides a comfortable way of food harmfulness verification. The thesis offers a reader an overview of current solutions and tools available in this field, it provides a breakdown of important image preprocessing algorithms and guides him through the construction of the library and the application for mobile devices.

Search results