• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • 1
  • Tagged with
  • 14
  • 14
  • 14
  • 6
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images

Stewart, Seth Andrew 01 October 2018 (has links)
We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place.
12

Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images

Stewart, Seth Andrew 01 October 2018 (has links)
We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place.
13

Wordspotting from multilingual and stylistic documents / Repérage de mots dans les images de documents multilingues et graphiques

Tarafdar, Arundhati 12 July 2017 (has links)
Les outils et méthodes d’analyse d’images de documents (DIA) donnent aujourd’hui la possibilité de faire des recherches par mots-clés dans des bases d’images de documents alors même qu’aucune transcription n’est disponible. Dans ce contexte, beaucoup de travaux ont déjà été réalisés sur les OCR ainsi que sur des systèmes de repérage de mots (spotting) dédiés à des documents textuels avec une mise en page simple. En revanche, très peu d’approches ont été étudiées pour faire de la recherche dans des documents contenant du texte multi-orienté et multi-échelle, comme dans les documents graphiques. Par exemple, les images de cartes géographiques peuvent contenir des symboles, des graphiques et du texte ayant des orientations et des tailles différentes. Dans ces documents, les caractères peuvent aussi être connectés entre eux ou bien à des éléments graphiques. Par conséquent, le repérage de mots dans ces documents se révèle être une tâche difficile. Dans cette thèse nous proposons un ensemble d’outils et méthodes dédiés au repérage de mots écrits en caractères bengali ou anglais (script Roman) dans des images de documents géographiques. L’approche proposée repose sur plusieurs originalités. / Word spotting in graphical documents is a very challenging task. To address such scenarios this thesis deals with developing a word spotting system dedicated to geographical documents with Bangla and English (Roman) scripts. In the proposed system, at first, text-graphics layers are separated using filtering, clustering and self-reinforcement through classifier. Additionally, instead of using binary decision we have used probabilistic measurement to represent the text components. Subsequently, in the text layer, character segmentation approach is applied using water-reservoir based method to extract individual character from the document. Then recognition of these isolated characters is done using rotation invariant feature, coupled with SVM classifier. Well recognized characters are then grouped based on their sizes. Initial spotting is started to find a query word among those groups of characters. In case if the system could spot a word partially due to any noise, SIFT is applied to identify missing portion of that partial spotting. Experimental results on Roman and Bangla scripts document images show that the method is feasible to spot a location in text labeled graphical documents. Experiments are done on an annotated dataset which was developed for this work. We have made this annotated dataset available publicly for other researchers.
14

Camera-Captured Document Image Analysis

Kasar, Thotreingam 11 1900 (has links) (PDF)
Text is no longer confined to scanned pages and often appears in camera-based images originating from text on real world objects. Unlike the images from conventional flatbed scanners, which have a controlled acquisition environment, camera-based images pose new challenges such as uneven illumination, blur, poor resolution, perspective distortion and 3D deformations that can severely affect the performance of any optical character recognition (OCR) system. Due to the variations in the imaging condition as well as the target document type, traditional OCR systems, designed for scanned images, cannot be directly applied to camera-captured images and a new level of processing needs to be addressed. In this thesis, we study some of the issues commonly encountered in camera-based image analysis and propose novel methods to overcome them. All the methods make use of color connected components. 1. Connected component descriptor for document image mosaicing Document image analysis often requires mosaicing when it is not possible to capture a large document at a reasonable resolution in a single exposure. Such a document is captured in parts and mosaicing stitches them into a single image. Since connected components (CCs) in a document image can easily be extracted regardless of the image rotation, scale and perspective distortion, we design a robust feature named connected component descriptor that is tailored for mosaicing camera-captured document images. The method involves extraction of a circular measurement region around each CC and its description using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a CC are augmented with those of its 2 nearest neighbors. Our method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to multiple instances of many characters in a document image. We illustrate the effectiveness of the proposed method on camera-captured document images exhibiting large variations in viewpoint, illumination and scale. 2. Font and background color independent text binarization The first step in an OCR system, after document acquisition, is binarization, which converts a gray-scale/color image into a two-level image -the foreground text and the background. We propose two methods for binarization of color documents whereby the foreground text is output as black and the background as white regardless of the polarity of foreground-background shades. (a) Hierarchical CC Analysis: The method employs an edge-based connected component approach and automatically determines a threshold for each component. It overcomes several limitations of existing locally-adaptive thresholding techniques. Firstly, it can handle documents with multi-colored texts with different background shades. Secondly, the method is applicable to documents having text of widely varying sizes, usually not handled by local binarization methods. Thirdly, the method automatically computes the threshold for binarization and the logic for inverting the output from the image data and does not require any input parameter. However, the method is sensitive to complex backgrounds since it relies on the edge information to identify CCs. It also uses script-specific characteristics to filter out edge components before binarization and currently works well for Roman script only. (b) Contour-based color clustering (COCOCLUST): To overcome the above limitations, we introduce a novel unsupervised color clustering approach that operates on a ‘small’ representative set of color pixels identified using the contour information. Based on the assumption that every character is of a uniform color, we analyze each color layer individually and identify potential text regions for binarization. Experiments on several complex images having large variations in font, size, color, orientation and script illustrate the robustness of the method. 3. Multi-script and multi-oriented text extraction from scene images Scene text understanding normally involves a pre-processing step of text detection and extraction before subjecting the acquired image for character recognition task. The subsequent recognition task is performed only on the detected text regions so as to mitigate the effect of background complexity. We propose a color-based CC labeling for robust text segmentation from natural scene images. Text CCs are identified using a combination of support vector machine and neural network classifiers trained on a set of low-level features derived from the boundary, stroke and gradient information. We develop a semiautomatic annotation toolkit to generate pixel-accurate groundtruth of 100 scenic images containing text in various layout styles and multiple scripts. The overall precision, recall and f-measure obtained on our dataset are 0.8, 0.86 and 0.83, respectively. The proposed method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset, which, however, has only horizontal English text. The overall precision, recall and f-measure obtained are 0.63, 0.59 and 0.61 respectively, which is comparable to the best performing methods in the ICDAR 2005 text locating competition. A recent method proposed by Epshtein et al. [1] achieves better results but it cannot handle arbitrarily oriented text. Our method, however, works well for generic scene images having arbitrary text orientations. 4. Alignment of curved text lines Conventional OCR systems perform poorly on document images that contain multi-oriented text lines. We propose a technique that first identifies individual text lines by grouping adjacent CCs based on their proximity and regularity. For each identified text string, a B-spline curve is fitted to the centroids of the constituent characters and normal vectors are computed along the fitted curve. Each character is then individually rotated such that the corresponding normal vector is aligned with the vertical axis. The method has been tested on a data set consisting of 50 images with text laid out in various ways namely along arcs, waves, triangles and a combination of these with linearly skewed text lines. It yields 95.9% recognition accuracy on text strings, where, before alignment, state-of-the-art OCRs fail to recognize any text. The CC-based pre-processing algorithms developed are well-suited for processing camera-captured images. We demonstrate the feasibility of the algorithms on the publicly-available ICDAR 2003 robust reading competition dataset and our own database comprising camera-captured document images that contain multiple scripts and arbitrary text layouts.

Page generated in 0.2903 seconds