• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 23
  • 12
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 51
  • 51
  • 20
  • 18
  • 17
  • 15
  • 14
  • 12
  • 11
  • 10
  • 10
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Rozpoznávání historických textů pomocí hlubokých neuronových sítí / Convolutional Networks for Historic Text Recognition

Kišš, Martin January 2018 (has links)
The aim of this work is to create a tool for automatic transcription of historical documents. The work is mainly focused on the recognition of texts from the period of modern times written using font Fraktur. The problem is solved with a newly designed recurrent convolutional neural networks and a Spatial Transformer Network. Part of the solution is also an implemented generator of artificial historical texts. Using this generator, an artificial data set is created on which the convolutional neural network for line recognition is trained. This network is then tested on real historical lines of text on which the network achieves up to 89.0 % of character accuracy. The contribution of this work is primarily the newly designed neural network for text line recognition and the implemented artificial text generator, with which it is possible to train the neural network to recognize real historical lines of text.
42

Využití hlubokého učení pro rozpoznání textu v obrazu grafického uživatelského rozhraní / Deep Learning for OCR in GUI

Hamerník, Pavel January 2019 (has links)
Optical character recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into a sequence of characters. Despite decades of intense research, OCR systems with capabilities to that of human still remains an open challenge. In this work there is presented a design and implementation of such system, which is capable of detecting texts in graphical user interfaces.
43

Využití hlubokého učení pro rozpoznání textu v obrazu grafického uživatelského rozhraní / Deep Learning for OCR in GUI

Hamerník, Pavel January 2019 (has links)
Optical character recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into a sequence of characters. Despite decades of intense research, OCR systems with capabilities to that of human still remains an open challenge. In this work there is presented a design and implementation of such system, which is capable of detecting texts in graphical user interfaces.
44

Active Learning pro zpracování archivních pramenů / Active Learning for Processing of Archive Sources

Hříbek, David January 2021 (has links)
This work deals with the creation of a system that allows uploading and annotating scans of historical documents and subsequent active learning of models for character recognition (OCR) on available annotations (marked lines and their transcripts). The work describes the process, classifies the techniques and presents an existing system for character recognition. Above all, emphasis is placed on machine learning methods. Furthermore, the methods of active learning are explained and a method of active learning of available OCR models from annotated scans is proposed. The rest of the work deals with a system design, implementation, available datasets, evaluation of self-created OCR model and testing of the entire system.
45

Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques

Tiensuu, Jacob, Rådahl, Elsa January 2021 (has links)
This project aims to support pharmacovigilance, the science and activities relating to drug-safety and prevention of adverse drug reactions (ADRs). We focus on a specific ADR called QT prolongation, a serious reaction affecting the heartbeat. Our main goal is to group medicinal ingredients that might cause QT prolongation. This grouping can be used in safety analysis and for exclusion lists in clinical studies. It should preferably be ranked according to level of suspected correlation. We wished to create an automated and standardised process. Drug safety-related reports describing patients' experienced ADRs and what medicinal products they have taken are collected in a database called VigiBase, that we have used as source for ingredient extraction. The ADRs are described in free-texts and coded using an international standardised terminology. This helps us to process the data and filter ingredients included in a report that describes QT prolongation. To broaden our project scope to include uncoded data, we extended the process to use free-text verbatims describing the ADR as input. By processing and filtering the free-text data and training a classification model for natural language processing released by Google on VigiBase data, we were able to predict if a free-text verbatim is describing QT prolongation. The classification resulted in an F1-score of 98%. For the ingredients extracted from VigiBase, we wanted to validate if there is a known connection to QT prolongation. The VigiBase occurrences is a parameter to consider, but it might be misleading since a report can include several drugs, and a drug can include several ingredients, making it hard to validate the cause. For validation, we used product labels connected to each ingredient of interest. We used a tool to download, scan and code product labels in order to see which ones mention QT prolongation. To rank our final list of ingredients according to level of suspected QT prolongation correlation, we used a multinomial logistic regression model. As training data, we used a data subset manually labeled by pharmacists. Used on unlabeled validation data, the model accuracy was 68%. Analyzing the training data showed that it was not easily separated linearly explaining the limited classification performance. The final ranked list of ingredients suspected to cause QT prolongation consists of 1086 ingredients.
46

Contributions to the joint segmentation and classification of sequences (My two cents on decoding and handwriting recognition)

España Boquera, Salvador 05 April 2016 (has links)
[EN] This work is focused on problems (like automatic speech recognition (ASR) and handwritten text recognition (HTR)) that: 1) can be represented (at least approximately) in terms of one-dimensional sequences, and 2) solving these problems entails breaking the observed sequence down into segments which are associated to units taken from a finite repertoire. The required segmentation and classification tasks are so intrinsically interrelated ("Sayre's Paradox") that they have to be performed jointly. We have been inspired by what some works call the "successful trilogy", which refers to the synergistic improvements obtained when considering: - a good formalization framework and powerful algorithms; - a clever design and implementation taking the best profit of hardware; - an adequate preprocessing and a careful tuning of all heuristics. We describe and study "two stage generative models" (TSGMs) comprising two stacked probabilistic generative stages without reordering. This model not only includes Hidden Markov Models (HMMs, but also "segmental models" (SMs). "Two stage decoders" may be deduced by simply running a TSGM in reversed way, introducing non determinism when required: 1) A directed acyclic graph (DAG) is generated and 2) it is used together with a language model (LM). One-pass decoders constitute a particular case. A formalization of parsing and decoding in terms of semiring values and language equations proposes the use of recurrent transition networks (RTNs) as a normal form for Context Free Grammars (CFGs), using them in a parsing-as-composition paradigm, so that parsing CFGs result in a slight extension of regular ones. Novel transducer composition algorithms have been proposed that can work with RTNs and can deal with null transitions without resorting to filter-composition even in the presence of null transitions and non-idempotent semirings. A review of LMs is described and some contributions mainly focused on LM interfaces, LM representation and on the evaluation of Neural Network LMs (NNLMs) are provided. A review of SMs includes the combination of generative and discriminative segmental models and general scheme of frame emission and another one of SMs. Some fast cache-friendly specialized Viterbi lexicon decoders taking profit of particular HMM topologies are proposed. They are able to manage sets of active states without requiring dictionary look-ups (e.g. hashing). A dataflow architecture allowing the design of flexible and diverse recognition systems from a little repertoire of components has been proposed, including a novel DAG serialization protocol. DAG generators can take over-segmentation constraints into account, make use SMs other than HMMs, take profit of the specialized decoders proposed in this work and use a transducer model to control its behavior making it possible, for instance, to use context dependent units. Relating DAG decoders, they take profit of a general LM interface that can be extended to deal with RTNs. Some improvements for one pass decoders are proposed by combining the specialized lexicon decoders and the "bunch" extension of the LM interface, including an adequate parallelization. The experimental part is mainly focused on HTR tasks on different input modalities (offline, bimodal). We have proposed some novel preprocessing techniques for offline HTR which replace classical geometrical heuristics and make use of automatic learning techniques (neural networks). Experiments conducted on the IAM database using this new preprocessing and HMM hybridized with Multilayer Perceptrons (MLPs) have obtained some of the best results reported for this reference database. Among other HTR experiments described in this work, we have used over-segmentation information, tried lexicon free approaches, performed bimodal experiments and experimented with the combination of hybrid HMMs with holistic classifiers. / [ES] Este trabajo se centra en problemas (como reconocimiento automático del habla (ASR) o de escritura manuscrita (HTR)) que cumplen: 1) pueden representarse (quizás aproximadamente) en términos de secuencias unidimensionales, 2) su resolución implica descomponer la secuencia en segmentos que se pueden clasificar en un conjunto finito de unidades. Las tareas de segmentación y de clasificación necesarias están tan intrínsecamente interrelacionadas ("paradoja de Sayre") que deben realizarse conjuntamente. Nos hemos inspirado en lo que algunos autores denominan "La trilogía exitosa", refereido a la sinergia obtenida cuando se tiene: - un buen formalismo, que dé lugar a buenos algoritmos; - un diseño e implementación ingeniosos y eficientes, que saquen provecho de las características del hardware; - no descuidar el "saber hacer" de la tarea, un buen preproceso y el ajuste adecuado de los diversos parámetros. Describimos y estudiamos "modelos generativos en dos etapas" sin reordenamientos (TSGMs), que incluyen no sólo los modelos ocultos de Markov (HMM), sino también modelos segmentales (SMs). Se puede obtener un decodificador de "dos pasos" considerando a la inversa un TSGM introduciendo no determinismo: 1) se genera un grafo acíclico dirigido (DAG) y 2) se utiliza conjuntamente con un modelo de lenguaje (LM). El decodificador de "un paso" es un caso particular. Se formaliza el proceso de decodificación con ecuaciones de lenguajes y semianillos, se propone el uso de redes de transición recurrente (RTNs) como forma normal de gramáticas de contexto libre (CFGs) y se utiliza el paradigma de análisis por composición de manera que el análisis de CFGs resulta una extensión del análisis de FSA. Se proponen algoritmos de composición de transductores que permite el uso de RTNs y que no necesita recurrir a composición de filtros incluso en presencia de transiciones nulas y semianillos no idempotentes. Se propone una extensa revisión de LMs y algunas contribuciones relacionadas con su interfaz, con su representación y con la evaluación de LMs basados en redes neuronales (NNLMs). Se ha realizado una revisión de SMs que incluye SMs basados en combinación de modelos generativos y discriminativos, así como un esquema general de tipos de emisión de tramas y de SMs. Se proponen versiones especializadas del algoritmo de Viterbi para modelos de léxico y que manipulan estados activos sin recurrir a estructuras de tipo diccionario, sacando provecho de la caché. Se ha propuesto una arquitectura "dataflow" para obtener reconocedores a partir de un pequeño conjunto de piezas básicas con un protocolo de serialización de DAGs. Describimos generadores de DAGs que pueden tener en cuenta restricciones sobre la segmentación, utilizar modelos segmentales no limitados a HMMs, hacer uso de los decodificadores especializados propuestos en este trabajo y utilizar un transductor de control que permite el uso de unidades dependientes del contexto. Los decodificadores de DAGs hacen uso de un interfaz bastante general de LMs que ha sido extendido para permitir el uso de RTNs. Se proponen también mejoras para reconocedores "un paso" basados en algoritmos especializados para léxicos y en la interfaz de LMs en modo "bunch", así como su paralelización. La parte experimental está centrada en HTR en diversas modalidades de adquisición (offline, bimodal). Hemos propuesto técnicas novedosas para el preproceso de escritura que evita el uso de heurísticos geométricos. En su lugar, utiliza redes neuronales. Se ha probado con HMMs hibridados con redes neuronales consiguiendo, para la base de datos IAM, algunos de los mejores resultados publicados. También podemos mencionar el uso de información de sobre-segmentación, aproximaciones sin restricción de un léxico, experimentos con datos bimodales o la combinación de HMMs híbridos con reconocedores de tipo holístico. / [CA] Aquest treball es centra en problemes (com el reconeiximent automàtic de la parla (ASR) o de l'escriptura manuscrita (HTR)) on: 1) les dades es poden representar (almenys aproximadament) mitjançant seqüències unidimensionals, 2) cal descompondre la seqüència en segments que poden pertanyer a un nombre finit de tipus. Sovint, ambdues tasques es relacionen de manera tan estreta que resulta impossible separar-les ("paradoxa de Sayre") i s'han de realitzar de manera conjunta. Ens hem inspirat pel que alguns autors anomenen "trilogia exitosa", referit a la sinèrgia obtinguda quan prenim en compte: - un bon formalisme, que done lloc a bons algorismes; - un diseny i una implementació eficients, amb ingeni, que facen bon us de les particularitats del maquinari; - no perdre de vista el "saber fer", emprar un preprocés adequat i fer bon us dels diversos paràmetres. Descrivim i estudiem "models generatiu amb dues etapes" sense reordenaments (TSGMs), que inclouen no sols inclouen els models ocults de Markov (HMM), sinò també models segmentals (SM). Es pot obtindre un decodificador "en dues etapes" considerant a l'inrevés un TSGM introduint no determinisme: 1) es genera un graf acíclic dirigit (DAG) que 2) és emprat conjuntament amb un model de llenguatge (LM). El decodificador "d'un pas" en és un cas particular. Descrivim i formalitzem del procés de decodificació basada en equacions de llenguatges i en semianells. Proposem emprar xarxes de transició recurrent (RTNs) com forma normal de gramàtiques incontextuals (CFGs) i s'empra el paradigma d'anàlisi sintàctic mitjançant composició de manera que l'anàlisi de CFGs resulta una lleugera extensió de l'anàlisi de FSA. Es proposen algorismes de composició de transductors que poden emprar RTNs i que no necessiten recorrer a la composició amb filtres fins i tot amb transicions nul.les i semianells no idempotents. Es proposa una extensa revisió de LMs i algunes contribucions relacionades amb la seva interfície, amb la seva representació i amb l'avaluació de LMs basats en xarxes neuronals (NNLMs). S'ha realitzat una revisió de SMs que inclou SMs basats en la combinació de models generatius i discriminatius, així com un esquema general de tipus d'emissió de trames i altre de SMs. Es proposen versions especialitzades de l'algorisme de Viterbi per a models de lèxic que permeten emprar estats actius sense haver de recórrer a estructures de dades de tipus diccionari, i que trauen profit de la caché. S'ha proposat una arquitectura de flux de dades o "dataflow" per obtindre diversos reconeixedors a partir d'un xicotet conjunt de peces amb un protocol de serialització de DAGs. Descrivim generadors de DAGs capaços de tindre en compte restriccions sobre la segmentació, emprar models segmentals no limitats a HMMs, fer us dels decodificadors especialitzats proposats en aquest treball i emprar un transductor de control que permet emprar unitats dependents del contexte. Els decodificadors de DAGs fan us d'una interfície de LMs prou general que ha segut extesa per permetre l'ús de RTNs. Es proposen millores per a reconeixedors de tipus "un pas" basats en els algorismes especialitzats per a lèxics i en la interfície de LMs en mode "bunch", així com la seua paral.lelització. La part experimental està centrada en el reconeiximent d'escriptura en diverses modalitats d'adquisició (offline, bimodal). Proposem un preprocés d'escriptura manuscrita evitant l'us d'heurístics geomètrics, en el seu lloc emprem xarxes neuronals. S'han emprat HMMs hibridats amb xarxes neuronals aconseguint, per a la base de dades IAM, alguns dels millors resultats publicats. També podem mencionar l'ús d'informació de sobre-segmentació, aproximacions sense restricció a un lèxic, experiments amb dades bimodals o la combinació de HMMs híbrids amb classificadors holístics. / España Boquera, S. (2016). Contributions to the joint segmentation and classification of sequences (My two cents on decoding and handwriting recognition) [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62215 / TESIS / Premios Extraordinarios de tesis doctorales
47

On dysgraphia diagnosis support via the automation of the BVSCO test scoring : Leveraging deep learning techniques to support medical diagnosis of dysgraphia / Om dysgrafi diagnosstöd via automatisering av BVSCO-testpoäng : Utnyttja tekniker för djupinlärning för att stödja medicinsk diagnos av dysgrafi

Sommaruga, Riccardo January 2022 (has links)
Dysgraphia is a rather widespread learning disorder in the current society. It is well established that an early diagnosis of this writing disorder can lead to improvement in writing skills. However, as of today, although there is no comprehensive standard process for the evaluation of dysgraphia, most of the tests used for this purpose must be done at a physician’s office. On the other hand, the pandemic triggered by COVID-19 has forced people to stay at home and opened the door to the development of online medical consultations. The present study therefore aims to propose an automated pipeline to provide pre-clinical diagnosis of dysgraphia. In particular, it investigates the possibility of applying deep learning techniques to the most widely used test for assessing writing difficulties in Italy, the BVSCO-2. This test consists of several writing exercises to be performed by the child on paper under the supervision of a doctor. To test the hypothesis that it is possible to enable children to have their writing impairment recognized even at a distance, an innovative system has been developed. It leverages an already developed customized tablet application that captures the graphemes produced by the child and an artificial neural network that processes the images and recognizes the handwritten text. The experimental results were analyzed using different methods and were compared with the actual diagnosis that a doctor would have provided if the test had been carried out normally. It turned out that, despite a slight fixed bias introduced by the machine for some specific exercises, these results seemed very promising in terms of both handwritten text recognition and diagnosis of children with dysgraphia, thus giving a satisfactory answer to the proposed research question. / Dysgrafi är en ganska utbredd inlärningsstörning i dagens samhälle. Det är väl etablerat att en tidig diagnos av denna skrivstörning kan leda till en förbättring av skrivförmågan. Även om det i dag inte finns någon omfattande standardprocess för utvärdering av dysgrafi måste dock de flesta av de tester som används för detta ändamål göras på en läkarmottagning. Å andra sidan har den pandemi som utlöstes av COVID-19 tvingat människor att stanna hemma och öppnat dörren för utvecklingen av medicinska konsultationer online. Syftet med denna studie är därför att föreslå en automatiserad pipeline för att ge preklinisk diagnos av dysgrafi. I synnerhet undersöks möjligheten att tillämpa djupinlärningstekniker på det mest använda testet för att bedöma skrivsvårigheter i Italien, BVSCO-2. Testet består av flera skrivövningar som barnet ska utföra på papper under överinseende av en läkare. För att testa hypotesen att det är möjligt att göra det möjligt för barn att få sina skrivsvårigheter erkända även på distans har ett innovativt system utvecklats. Det utnyttjar en redan utvecklad skräddarsydd applikation för surfplattor som fångar de grafem som barnet producerar och ett artificiellt neuralt nätverk som bearbetar bilderna och känner igen den handskrivna texten. De experimentella resultaten analyserades med hjälp av olika metoder och jämfördes med den faktiska diagnos som en läkare skulle ha ställt om testet hade utförts normalt. Det visade sig att, trots en liten fast bias som maskinen införde för vissa specifika övningar, verkade dessa resultat mycket lovande när det gäller både igenkänning av handskriven text och diagnos av barn med dysgrafi, vilket gav ett tillfredsställande svar på den föreslagna forskningsfrågan.
48

Methods for Text Segmentation from Scene Images

Kumar, Deepak January 2014 (has links) (PDF)
Recognition of text from camera-captured scene/born-digital images help in the development of aids for the blind, unmanned navigation systems and spam filters. However, text in such images is not confined to any page layout, and its location within in the image is random in nature. In addition, motion blur, non-uniform illumination, skew, occlusion and scale-based degradations increase the complexity in locating and recognizing the text in a scene/born-digital image. Text localization and segmentation techniques are proposed for the born-digital image data set. The proposed OTCYMIST technique won the first place and placed in the third position for its performance on the text segmentation task in ICDAR 2011 and ICDAR 2013 robust reading competitions for born-digital image data set, respectively. Here, Otsu’s binarization and Canny edge detection are separately carried out on the three colour planes of the image. Connected components (CC’s) obtained from the segmented image are pruned based on thresholds applied on their area and aspect ratio. CC’s with sufficient edge pixels are retained. The centroids of the individual CC’s are used as nodes of a graph. A minimum spanning tree is built using these nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is used to remove likely non-text components. CC’s are grouped based on their proximity in the horizontal direction to generate bounding boxes (BB’s) of text strings. Overlapping BB’s are removed using an overlap area threshold. Non-overlapping and minimally overlapping BB’s are used for text segmentation. These BB’s are split vertically to localize text at the word level. A word cropped from a document image can easily be recognized using a traditional optical character recognition (OCR) engine. However, recognizing a word, obtained by manually cropping a scene/born-digital image, is not trivial. Existing OCR engines do not handle these kinds of scene word images effectively. Our intention is to first segment the word image and then pass it to the existing OCR engines for recognition. In two aspects, it is advantageous: it avoids building a character classifier from scratch and reduces the word recognition task to a word segmentation task. Here, we propose two bottom-up approaches for the task of word segmentation. These approaches choose different features at the initial stage of segmentation. Power-law transform (PLT) was applied to the pixels of the gray scale born-digital images to non-linearly modify the histogram. The recognition rate achieved on born-digital word images is 82.9%, which is 20% more than the top performing entry (61.5%) in ICDAR 2011 robust reading competition. In addition, we explored applying PLT to the colour planes such as red, green, blue, intensity and lightness plane by varying the gamma value. We call this technique as Nonlinear enhancement and selection of plane (NESP) for optimal segmentation, which is an improvement over PLT. NESP chooses a particular plane with a proper gamma value based on Fisher discrimination factor. The recognition rate is 72.8% for scene images of ICDAR 2011 robust reading competition, which is 30% higher than the best entry (41.2%). The recognition rate is 81.7% and 65.9% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using NESP. Another technique, midline analysis and propagation of segmentation (MAPS), has also been proposed. Here, the middle row pixels of the gray scale image are first segmented and the statistics of the segmented pixels are used to assign text and non-text labels to the rest of the image pixels using min-cut method. Gaussian model is fitted on the middle row segmented pixels before the assignment of other pixels. In MAPS, we assume the middle row pixels are least affected by any of the degradations. This assumption is validated by the good word recognition rate of 71.7% on ICDAR 2011 robust reading competition for scene images. The recognition rate is 83.8% and 66.0% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using MAPS. The best reported results for ICDAR 2003 word images is 61.1% using custom lexicons containing the list of test words. On the other hand, NESP and MAPS achieve 66.2% and 64.5% for ICDAR 2003 word images without using any lexicon. By using similar custom lexicon, the recognition rates for ICDAR 2003 word images go up to 74.9% and 74.2% for NESP and MAPS methods, respectively. In place of passing an image segmented by a method, manually segmented word image is submitted to an OCR engine for benchmarking maximum possible recognition rate for each database. The recognition rates of the proposed methods and the benchmark results are reported on the seven publicly available word image data sets and compared with these of reported results in the literature. Since no good Kannada OCR is available, a classifier is designed to recognize Kannada characters and words from Chars74k data set and our own image collection, respectively. Discrete cosine transform (DCT) and block DCT are used as features to train separate classifiers. Kannada words are segmented using the same techniques (MAPS and NESP) and further segmented into groups of components, since a Kannada character may be represented by a single component or a group of components in an image. The recognition rate on Kannada words is reported for different features with and without the use of a lexicon. The obtained recognition performance for Kannada character recognition (11.4%) is three times the best performance (3.5%) reported in the literature.
49

Rozpoznávání textu pomocí konvolučních sítí / Optical Character Recognition Using Convolutional Networks

Csóka, Pavel January 2016 (has links)
This thesis aims at creation of new datasets for text recognition machine learning tasks and experiments with convolutional neural networks on these datasets. It describes architecture of convolutional nets, difficulties of recognizing text from photographs and contemporary works using these networks. Next, creation of annotation, using Tesseract OCR, for dataset comprised from photos of document pages, taken by mobile phones, named Mobile Page Photos. From this dataset two additional are created by cropping characters out of its photos formatted as Street View House Numbers dataset. Dataset Mobile Nice Page Photos Characters contains readable characters and Mobile Page Photos Characters adds hardly readable and unreadable ones. Three models of convolutional nets are created and used for text recognition experiments on these datasets, which are also used for estimation of annotation error.
50

Layout Analysis for Handwritten Documents. A Probabilistic Machine Learning Approach

Quirós Díaz, Lorenzo 21 March 2022 (has links)
[ES] El Análisis de la Estructura de Documentos (Document Layout Analysis), aplicado a documentos manuscritos, tiene como objetivo obtener automáticamente la estructura intrínseca de dichos documentos. Su desarrollo como campo de investigación se extiende desde los sistemas de segmentación de caracteres desarrollados a principios de la década de 1960 hasta los sistemas complejos desarrollados en la actualidad, donde el objetivo es analizar estructuras de alto nivel (líneas de texto, párrafos, tablas, etc.) y la relación que existe entre ellas. Esta tesis, en primer lugar, define el objetivo del Análisis de la Estructura de Documentos desde una perspectiva probabilística. A continuación, la complejidad del problema se reduce a un conjunto de subproblemas complementarios bien conocidos, de manera que pueda ser gestionado por medio de recursos informáticos modernos. Concretamente se abordan tres de los principales problemas del Análisis de la Estructura de Documentos siguiendo una formulación probabilística. Específicamente se aborda la Detección de Línea Base (Baseline Detection), la Segmentación de Regiones (Region Segmentation) y la Determinación del Orden de Lectura (Reading Order Determination). Uno de los principales aportes de esta tesis es la formalización de los problemas de Detección de Línea Base y Segmentación de Regiones bajo un marco probabilístico, donde ambos problemas pueden ser abordados por separado o de forma integrada por los modelos propuestos. Este último enfoque ha demostrado ser muy útil para procesar grandes colecciones de documentos con recursos informáticos limitados. Posteriormente se aborda el subproblema de la Determinación del Orden de Lectura, que es uno de los subproblemas más importantes, aunque subestimados, del Análisis de la Extructura de Documentos, ya que es el nexo que permite convertir los datos extraídos de los sistemas de Reconocimiento Automático de Texto (Automatic Text Recognition Systems) en información útil. Por lo tanto, en esta tesis abordamos y formalizamos la Determinación del Orden de Lectura como un problema de clasificación probabilística por pares. Además, se proponen dos diferentes algoritmos de decodificación que reducen la complejidad computacional del problema. Por otra parte, se utilizan diferentes modelos estadísticos para representar la distribución de probabilidad sobre la estructura de los documentos. Estos modelos, basados en Redes Neuronales Artificiales (desde un simple Perceptrón Multicapa hasta complejas Redes Convolucionales y Redes de Propuesta de Regiones), se estiman a partir de datos de entrenamiento utilizando algoritmos de aprendizaje automático supervisados. Finalmente, todas las contribuciones se evalúan experimentalmente, no solo en referencias académicas estándar, sino también en colecciones de miles de imágenes. Se han considerado documentos de texto manuascritos y documentos musicales manuscritos, ya que en conjunto representan la mayoría de los documentos presentes en bibliotecas y archivos. Los resultados muestran que los métodos propuestos son muy precisos y versátiles en una amplia gama de documentos manuscritos. / [CA] L'Anàlisi de l'Estructura de Documents (Document Layout Analysis), aplicada a documents manuscrits, pretén automatitzar l'obtenció de l'estructura intrínseca d'un document. El seu desenvolupament com a camp d'investigació comprén des dels sistemes de segmentació de caràcters creats al principi dels anys 60 fins als complexos sistemes de hui dia que busquen analitzar estructures d'alt nivell (línies de text, paràgrafs, taules, etc) i les relacions entre elles. Aquesta tesi busca, primer de tot, definir el propòsit de l'anàlisi de l'estructura de documents des d'una perspectiva probabilística. Llavors, una vegada reduïda la complexitat del problema, es processa utilitzant recursos computacionals moderns, per a dividir-ho en un conjunt de subproblemes complementaris més coneguts. Concretament, tres dels principals subproblemes de l'Anàlisi de l'Estructura de Documents s'adrecen seguint una formulació probabilística: Detecció de la Línia Base Baseline Detection), Segmentació de Regions (Region Segmentation) i Determinació de l'Ordre de Lectura (Reading Order Determination). Una de les principals contribucions d'aquesta tesi és la formalització dels problemes de la Detecció de les Línies Base i dels de Segmentació de Regions en un entorn probabilístic, sent els dos problemes tractats per separat o integrats en conjunt pels models proposats. Aquesta última aproximació ha demostrat ser de molta utilitat per a la gestió de grans col·leccions de documents amb uns recursos computacionals limitats. Posteriorment s'ha adreçat el subproblema de la Determinació de l'Ordre de Lectura, sent un dels subproblemes més importants de l'Anàlisi d'Estructures de Documents, encara així subestimat, perquè és el nexe que permet transformar en informació d'utilitat l'extracció de dades dels sistemes de reconeixement automàtic de text. És per això que el fet de determinar l'ordre de lectura s'adreça i formalitza com un problema d'ordenació probabilística per parells. A més, es proposen dos algoritmes descodificadors diferents que reducix la complexitat computacional del problema. Per altra banda s'utilitzen diferents models estadístics per representar la distribució probabilística sobre l'estructura dels documents. Aquests models, basats en xarxes neuronals artificials (des d'un simple perceptron multicapa fins a complexes xarxes convolucionals i de propostes de regió), s'estimen a partir de dades d'entrenament mitjançant algoritmes d'aprenentatge automàtic supervisats. Finalment, totes les contribucions s'avaluen experimentalment, no només en referents acadèmics estàndard, sinó també en col·leccions de milers d'imatges. S'han considerat documents de text manuscrit i documents musicals manuscrits, ja que representen la majoria de documents presents a biblioteques i arxius. Els resultats mostren que els mètodes proposats són molt precisos i versàtils en una àmplia gamma de documents manuscrits. / [EN] Document Layout Analysis, applied to handwritten documents, aims to automatically obtain the intrinsic structure of a document. Its development as a research field spans from the character segmentation systems developed in the early 1960s to the complex systems designed nowadays, where the goal is to analyze high-level structures (lines of text, paragraphs, tables, etc) and the relationship between them. This thesis first defines the goal of Document Layout Analysis from a probabilistic perspective. Then, the complexity of the problem is reduced, to be handled by modern computing resources, into a set of well-known complementary subproblems. More precisely, three of the main subproblems of Document Layout Analysis are addressed following a probabilistic formulation, namely Baseline Detection, Region Segmentation and Reading Order Determination. One of the main contributions of this thesis is the formalization of Baseline Detection and Region Segmentation problems under a probabilistic framework, where both problems can be handled separately or in an integrated way by the proposed models. The latter approach is proven to be very useful to handle large document collections under restricted computing resources. Later, the Reading Order Determination subproblem is addressed. It is one of the most important, yet underestimated, subproblem of Document Layout Analysis, since it is the bridge that allows us to convert the data extracted from Automatic Text Recognition systems into useful information. Therefore, Reading Order Determination is addressed and formalized as a pairwise probabilistic sorting problem. Moreover, we propose two different decoding algorithms that reduce the computational complexity of the problem. Furthermore, different statistical models are used to represent the probability distribution over the structure of the documents. These models, based on Artificial Neural Networks (from a simple Multilayer Perceptron to complex Convolutional and Region Proposal Networks), are estimated from training data using supervised Machine Learning algorithms. Finally, all the contributions are experimentally evaluated, not only on standard academic benchmarks but also in collections of thousands of images. We consider handwritten text documents and handwritten musical documents as they represent the majority of documents in libraries and archives. The results show that the proposed methods are very accurate and versatile in a very wide range of handwritten documents. / Quirós Díaz, L. (2022). Layout Analysis for Handwritten Documents. A Probabilistic Machine Learning Approach [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181483 / TESIS

Page generated in 0.0792 seconds