Spelling suggestions: "subject:"document processing"" "subject:"adocument processing""
1 |
A software toolkit for handprinted form readersCracknell, Christopher Robert William January 1999 (has links)
No description available.
|
2 |
Intelligent Pen: A least cost search approach to Stroke Extraction in Historical DocumentsBauer, Kevin L 01 May 2016 (has links)
Extracting strokes from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, overlapping ascenders and descenders and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. We introduce Intelligent Pen for piece-wise optimal stroke extraction. Extracted strokes are stitched together to provide a complete trace of the handwriting. Intelligent Pen formulates stroke extraction as a set of piece-wise optimal paths, extracted and assembled in cost order. As such, Intelligent Pen is robust to noise, gaps, faint handwriting and even competing lines and strokes. Intelligent Pen traces compare closely with the shape as well as the order in which the handwriting was written. A quantitative comparison with an ICDAR handwritten stroke data set shows Intelligent Pen traces to be within 0.78 pixels (mean difference) of the manually created strokes.
|
3 |
A Framework for Re-Purposing Textbooks Using Learning Outcomes/Methodology, Device Characteristics, Representation and User DimensionsCiftci, Tolga 03 October 2013 (has links)
As digital books begin to take center stage in our lives the importance of the old printed book still lingers on. A large number of the books printed on the paper media still have much to offer to readers for various reasons (e.g. less famous authors of prose, old books with interesting and original problems). To help individuals in digitizing and reusing their physical and digital books we decided to build a framework that will help people convert physical and digital books to other formats taking into consideration four dimensions: learning outcomes or methodology, target device characteristics, representation and the user. Our focus is on textbooks in history. Consequently, we do not consider some problems like math formulas. This work has the potential of helping people deal with the huge backlog of physical books that can become invisible as the digital books take off.
To show that our platform can help in repurposing books for student study activities, we have developed some transformations. The transformations we have implemented shows that the framework can be used to add study aids to books, optimize books for a target platform (e-reader device and application combination), and supplement available features of a target platform and maintain consistency across various audio/visual devices and e-book formats.
One of the important steps in the thesis was determining the study activities that we would support as examples in our implementation. We have chosen to implement support for the survey, question, read and review activities of the SQ3R reading technique. We have also implemented support for additional activities like search. The chosen activities and the support implemented for these activities are examples and are not meant to be complete.
Another important decision point was to decide which target platforms (e-reader device and application combination) we need to support. We decided to choose a few representatives and leave the rest as future work. The target devices were selected so as to have a variety of device capabilities like screen size, display technology (e.g. e-ink, VGA), and user interaction styles (e.g. touch-based, button based) combined with application capabilities (e.g. audio only, visual only, audio visual, grayscale, and color). The devices selected were: iPad, iPod, iPhone, Kindle 3rd generation, Kindle Fire, Sony PRS and a laptop. The e-reader applications are the ones that are available for these devices.
|
4 |
Intelligent Indexing: A Semi-Automated, Trainable System for Field LabelingClawson, Robert T 01 September 2014 (has links) (PDF)
We present Intelligent Indexing: a general, scalable, collaborative approach to indexing and transcription of non-machine-readable documents that exploits visual consensus and group labeling while harnessing human recognition and domain expertise. In our system, indexers work directly on the page, and with minimal context switching can navigate the page, enter labels, and interact with the recognition engine. Interaction with the recognition engine occurs through preview windows that allow the indexer to quickly verify and correct recommendations. This interaction is far superior to conventional, tedious, inefficient post-correction and editing. Intelligent Indexing is a trainable system that improves over time and can provide benefit even without prior knowledge. A user study was performed to compare Intelligent Indexing to a basic, manual indexing system. Volunteers report that using Intelligent Indexing is less mentally fatiguing and more enjoyable than the manual indexing system. Their results also show that it reduces significantly (30.2%) the time required to index census records, while maintaining comparable accuracy. A helpful video resource for learning more about this research is available on youtube through this link: https://www.youtube.com/watch?v=gqdVzEPnBEw
|
5 |
Artificial Neural Networks-Driven High Precision Tabular Information Extraction from DatasheetsFernandes, Johan 11 March 2022 (has links)
Global organizations have adopted Industry 4.0 practices to stay viable through the information shared through billions of digital documents. The information in such documents is vital to the daily functioning of such organizations. Most critical information is laid out in tabular format in order to provide the information in a concise manner. Extracting this critical data and providing access to the latest information can help institutions to make evidence based and data driven decisions. Assembling such data for analysis can further enable organizations to automate certain processes such as manufacturing. A generalized solution for table text extraction would have to handle the variations in the page content and table layouts in order to accurately extract the text. We hypothesize that a table text extraction pipeline can extract this data in three stages. The first stage would involve identifying the images that contain tables and detecting the table region. The second stage would consider the detected table region and detect the rows and columns of the table. The last stage would involve extracting the text from the cell locations generated by the intersecting lines of the detected rows and columns. For first stage of the pipeline, we propose TableDet: a deep learning (artificial neural network) based methodology to solve table detection and table image classification in datasheet (document) images in a single inference. TableDet utilizes a Cascade R-CNN architecture with Complete IOU (CIOU) loss at each box head and a deformable convolution backbone to capture the variations of tables that appear at multiple scales and orientations. It also detects text and figures to enhance its table detection performance. We demonstrate the effectiveness of training TableDet with a dual-step transfer learning process and fine-tuning it with Table Aware Cutout (TAC) augmented images. TableDet achieves the highest F1 score for table detection against state-of-the-art solutions on ICDAR 2013 (complete set), ICDAR 2017 (test set) and ICDAR 2019 (test set) with 100%, 99.3% and 95.1% respectively. We show that the enhanced table detection performance can be utilized to address the table image classification task with the addition of a classification head which comprises of 3 conditions. For the table image classification task TableDet achieves 100% recall and above 92% precision on three test sets. These classification results indicate that all images with tables along with a significantly reduced number of images without tables would be promoted to the next stage of the table text extraction pipeline. For the second stage we propose TableStrDet, a deep learning (artificial neural network) based approach to recognize the structure of the detected tables regions from stage 1 by detecting and classifying rows and columns. TableStrDet comprises of two Cascade R-CNN architectures each with a deformable backbone and Complete IOU loss to improve their detection performance. One architecture detects and classifies columns as regular columns (column without a merged cell) and irregular columns (group of regular columns that share a merged cell). The second architecture detects and classifies rows as regular rows (row without a merged cell) and irregular rows (group of regular rows that share a merged cell). Both architectures work in parallel to provide the results in a single inference. We show that utilizing TableStrDet to detect four classes of objects enhances the quality of table structure detection by capturing table contents that may or may not have hierarchical layouts on two public test sets. Under the TabStructDB test set we achieve 72.7% and 78.5% weighted average F1 score for rows and columns respectively. On the ICDAR 2013 test set we achieve 90.5% and 89.6% weighted average F1 score for rows and columns respectively. Furthermore, we show that TableStrDet has a higher generalization potential on the available datasets.
|
6 |
End-to-End Full-Page Handwriting RecognitionWigington, Curtis Michael 01 May 2018 (has links)
Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.
|
7 |
Uma fundamenta??o matem?tica para processamento digital de sinais intervalaresTrindade, Roque Mendes Prado 05 June 2009 (has links)
Made available in DSpace on 2014-12-17T14:54:52Z (GMT). No. of bitstreams: 1
RoqueMPT.pdf: 833646 bytes, checksum: 1c8b5455eaf6d2afeefcb65452d2b589 (MD5)
Previous issue date: 2009-06-05 / This work deals with a mathematical fundament for digital signal processing under point view of interval mathematics. Intend treat the open problem of precision and repesention of data in digital systems, with a intertval version of signals representation. Signals processing is a rich and complex area, therefore, this work makes a cutting with focus in systems linear invariant in the time. A vast literature in the area exists, but, some concepts in interval mathematics need to be redefined or to be elaborated for the construction of a solid theory of interval signal processing. We will construct a basic fundaments for signal processing in the interval version, such as basic properties linearity, stability, causality, a version to intervalar of linear systems e its properties. They will be presented interval versions of the convolution and the Z-transform. Will be made analysis of convergences of systems using interval Z-transform , a essentially interval distance, interval complex numbers , application in a interval filter. / Este trabalho explora uma fundamenta??o matem?tica, para o processamento digital de sinais sob uma ?ptica da matem?tica intervalar. Pretende explorar o problema aberto de precis?o e de representa??o de dados em sistemas digitais, trabalhando com uma vers?o intervalar de representa??o de sinais. Processamento de sinais ? uma ?rea muito ricae complexa, por isso, faremos um recorte e focaremos em sistemas lineares invariantes no tempo. Existe uma vasta literatura na ?rea, mas mesmo assim, ainda existe alguns conceitos na matem?tica intervalar que precisam ser redefinidos ou elaborados para a constru??o de uma teoria s?lida de processamento de sinais intervalares. Construiremos os fundamentos b?sicos para processamentos de sinais na vers?o intervalar, tais como as propriedades b?sicas linearidade, estabilidade, causalidade, uma vers?o intervalar de sistemas lineares e suas propriedades. Ser?o apresentadas vers?es intervalares da convolu??o e da transformada-Z. Ser? feita an?lise de converg?ncias de sistemas usando a transformada-Z intervalar, uma dist?ncia essencialmente intervalar, n?meros complexos intervalares, aplica??o em um filtro intervalar.
|
8 |
Tabular Information Extraction from Datasheets with Deep Learning for Semantic ModelingAkkaya, Yakup 22 March 2022 (has links)
The growing popularity of artificial intelligence and machine learning has led to the adop-
tion of the automation vision in the industry by many other institutions and organizations.
Many corporations have made it their primary objective to make the delivery of goods and
services and manufacturing in a more efficient way with minimal human intervention. Au-
tomated document processing and analysis is also a critical component of this cycle for
many organizations that contribute to the supply chain. The massive volume and diver-
sity of data created in this rapidly evolving environment make this a highly desired step.
Despite this diversity, important information in the documents is provided in the tables.
As a result, extracting tabular data is a crucial aspect of document processing.
This thesis applies deep learning methodologies to detect table structure elements for
the extraction of data and preparation for semantic modelling. In order to find optimal
structure definition, we analyzed the performance of deep learning models in different
formats such as row/column and cell. The combined row and column detection models
perform poorly compared to other models’ detection performance due to the highly over-
lapping nature of rows and columns. Separate row and column detection models seem
to achieve the best average F1-score with 78.5% and 79.1%, respectively. However, de-
termining cell elements from the row and column detections for semantic modelling is
a complicated task due to spanning rows and columns. Considering these facts, a new
method is proposed to set the ground-truth information called a content-focused annota-
tion to define table elements better. Our content-focused method is competent in handling
ambiguities caused by huge white spaces and lack of boundary lines in table structures;
hence, it provides higher accuracy.
Prior works have addressed the table analysis problem under table detection and table
structure detection tasks. However, the impact of dataset structures on table structure
detection has not been investigated. We provide a comparison of table structure detection
performance with cropped and uncropped datasets. The cropped set consists of only
table images that are cropped from documents assuming tables are detected perfectly.
The uncropped set consists of regular document images. Experiments show that deep
learning models can improve the detection performance by up to 9% in average precision
and average recall on the cropped versions. Furthermore, the impact of cropped images is
negligible under the Intersection over Union (IoU) values of 50%-70% when compared to
the uncropped versions. However, beyond 70% IoU thresholds, cropped datasets provide
significantly higher detection performance.
|
9 |
Extraction d'informations textuelles au sein de documents numérisés : cas des factures / Extracting textual information within scanned documents : case of invoicesPitou, Cynthia 28 September 2017 (has links)
Le traitement automatique de documents consiste en la transformation dans un format compréhensible par un système informatique de données présentes au sein de documents et compréhensibles par l'Homme. L'analyse de document et la compréhension de documents sont les deux phases du processus de traitement automatique de documents. Étant donnée une image de document constituée de mots, de lignes et d'objets graphiques tels que des logos, l'analyse de documents consiste à extraire et isoler les mots, les lignes et les objets, puis à les regrouper au sein de blocs. Les différents blocs ainsi formés constituent la structure géométrique du document. La compréhension de documents fait correspondre à cette structure géométrique une structure logique en considérant des liaisons logiques (à gauche, à droite, au-dessus, en-dessous) entre les objets du document. Un système de traitement de documents doit être capable de : (i) localiser une information textuelle, (ii) identifier si cette information est pertinente par rapport aux autres informations contenues dans le document, (iii) extraire cette information dans un format compréhensible par un programme informatique. Pour la réalisation d'un tel système, les difficultés à surmonter sont liées à la variabilité des caractéristiques de documents, telles que le type (facture, formulaire, devis, rapport, etc.), la mise en page (police, style, agencement), la langue, la typographie et la qualité de numérisation du document. Dans ce mémoire, nous considérons en particulier des documents numérisés, également connus sous le nom d'images de documents. Plus précisément, nous nous intéressons à la localisation d'informations textuelles au sein d'images de factures, afin de les extraire à l'aide d'un moteur de reconnaissance de caractères. Les factures sont des documents très utilisés mais non standards. En effet, elles contiennent des informations obligatoires (le numéro de facture, le numéro siret de l'émetteur, les montants, etc.) qui, selon l'émetteur, peuvent être localisées à des endroits différents. Les contributions présentées dans ce mémoire s'inscrivent dans le cadre de la localisation et de l'extraction d'informations textuelles fondées sur des régions identifiées au sein d'une image de document.Tout d'abord, nous présentons une approche de décomposition d'une image de documents en sous-régions fondée sur la décomposition quadtree. Le principe de cette approche est de décomposer une image de documents en quatre sous-régions, de manière récursive, jusqu'à ce qu'une information textuelle d'intérêt soit extraite à l'aide d'un moteur de reconnaissance de caractères. La méthode fondée sur cette approche, que nous proposons, permet de déterminer efficacement les régions contenant une information d'intérêt à extraire.Dans une autre approche, incrémentale et plus flexible, nous proposons un système d'extraction d'informations textuelles qui consiste en un ensemble de régions prototypes et de chemins pour parcourir ces régions prototypes. Le cycle de vie de ce système comprend cinq étapes:- Construction d'un jeu de données synthétiques à partir d'images de factures réelles contenant les informations d'intérêts.- Partitionnement des données produites.- Détermination des régions prototypes à partir de la partition obtenue.- Détermination des chemins pour parcourir les régions prototypes, à partir du treillis de concepts d'un contexte formel convenablement construit.- Mise à jour du système de manière incrémentale suite à l'insertion de nouvelles données / Document processing is the transformation of a human understandable data in a computer system understandable format. Document analysis and understanding are the two phases of document processing. Considering a document containing lines, words and graphical objects such as logos, the analysis of such a document consists in extracting and isolating the words, lines and objects and then grouping them into blocks. The subsystem of document understanding builds relationships (to the right, left, above, below) between the blocks. A document processing system must be able to: locate textual information, identify if that information is relevant comparatively to other information contained in the document, extract that information in a computer system understandable format. For the realization of such a system, major difficulties arise from the variability of the documents characteristics, such as: the type (invoice, form, quotation, report, etc.), the layout (font, style, disposition), the language, the typography and the quality of scanning.This work is concerned with scanned documents, also known as document images. We are particularly interested in locating textual information in invoice images. Invoices are largely used and well regulated documents, but not unified. They contain mandatory information (invoice number, unique identifier of the issuing company, VAT amount, net amount, etc.) which, depending on the issuer, can take various locations in the document. The present work is in the framework of region-based textual information localization and extraction.First, we present a region-based method guided by quadtree decomposition. The principle of the method is to decompose the images of documents in four equals regions and each regions in four new regions and so on. Then, with a free optical character recognition (OCR) engine, we try to extract precise textual information in each region. A region containing a number of expected textual information is not decomposed further. Our method allows to determine accurately in document images, the regions containing text information that one wants to locate and retrieve quickly and efficiently.In another approach, we propose a textual information extraction model consisting in a set of prototype regions along with pathways for browsing through these prototype regions. The life cycle of the model comprises five steps:- Produce synthetic invoice data from real-world invoice images containing the textual information of interest, along with their spatial positions.- Partition the produced data.- Derive the prototype regions from the obtained partition clusters.- Derive pathways for browsing through the prototype regions, from the concept lattice of a suitably defined formal context.- Update incrementally the set of protype regions and the set of pathways, when one has to add additional data.
|
10 |
Fully Convolutional Neural Networks for Pixel Classification in Historical Document ImagesStewart, Seth Andrew 01 October 2018 (has links)
We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place.
|
Page generated in 0.1267 seconds