Global ETD Search

1	Identificação da correlação entre as características das imagens de documentos e os impactos na fidelidade visual em função da taxa de compressão. / Identification of correlation between the characteristics of document images and its impact in visual fidelity in function of compression rate. Tsujiguchi, Vitor Hitoshi 11 October 2011 (has links) Imagens de documentos são documentos digitalizados com conteúdo textual. Estes documentos são compostos de caracteres e diagramação, apresentando características comuns entre si, como a presença de bordas e limites no formato de cada caractere. A relação entre as características das imagens de documentos e os impactos do processo de compressão com respeito à fidelidade visual são analisadas nesse trabalho. Métricas objetivas são empregadas na análise das características das imagens de documentos, como a medida da atividade da imagem (IAM) no domínio espacial dos pixels, e a verificação da medida de atividade espectral (SAM) no domínio espectral. Os desempenhos das técnicas de compressão de imagens baseada na transformada discreta de cosseno (DCT) e na transformada discreta de Wavelet (DWT) são avaliados sobre as imagens de documentos ao aplicar diferentes níveis de compressão sobre as mesmas, para cada técnica. Os experimentos são realizados sobre imagens digitais de documentos impressos e manuscritos de livros e periódicos, explorando texto escritos entre os séculos 16 ao século 19. Este material foi coletado na biblioteca Brasiliana Digital (www.brasiliana.usp.br), no Brasil. Resultados experimentais apontam que as medidas de atividade nos domínios espacial e espectral influenciam diretamente a fidelidade visual das imagens comprimidas para ambas as técnicas baseadas em DCT e DWT. Para uma taxa de compressão fixa de uma imagem comprimida em ambas técnicas, a presença de valores superiores de IAM e níveis menores de SAM na imagem de referência resultam em menor fidelidade visual, após a compressão. / Document images are digitized documents with textual content. These documents are composed of characters and their layout, with common characteristics among them, such as the presence of borders and boundaries in the shape of each character. The relationship between the characteristics of document images and the impact of the compression process with respect to visual fidelity are analyzed herein. Objective metrics are employed to analyze the characteristics of document images, such as the Image Activity Measure (IAM) in the spatial domain, and assessment of Spectral Activity Measure (SAM) in the spectral domain. The performance of image compression techniques based on Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are evaluated from document images by applying different compression levels for each technique to these images. The experiments are performed on digital images of printed documents and manuscripts of books and magazines, exploring texts written from the 16th to the 19th century. This material was collected in the Brasiliana Digital Library in Brazil. Experimental results show that the activity measures in spatial and spectral domains directly influence the visual fidelity of compressed images for both the techniques based on DCT and DWT. For a fixed compression ratio for both techniques on a compressed image, higher values of IAM and low levels of SAM in the reference image result in less visual fidelity after compression. Bibliotecas digitais Brasiliana- Digital Library Brasiliana-Biblioteca Digital Compressão de imagens Digital libraries Digitized documents Document images Documentos digitalizados Image compression Image quality Imagens de documentos Qualidade da imagem
2	Identificação da correlação entre as características das imagens de documentos e os impactos na fidelidade visual em função da taxa de compressão. / Identification of correlation between the characteristics of document images and its impact in visual fidelity in function of compression rate. Vitor Hitoshi Tsujiguchi 11 October 2011 (has links) Imagens de documentos são documentos digitalizados com conteúdo textual. Estes documentos são compostos de caracteres e diagramação, apresentando características comuns entre si, como a presença de bordas e limites no formato de cada caractere. A relação entre as características das imagens de documentos e os impactos do processo de compressão com respeito à fidelidade visual são analisadas nesse trabalho. Métricas objetivas são empregadas na análise das características das imagens de documentos, como a medida da atividade da imagem (IAM) no domínio espacial dos pixels, e a verificação da medida de atividade espectral (SAM) no domínio espectral. Os desempenhos das técnicas de compressão de imagens baseada na transformada discreta de cosseno (DCT) e na transformada discreta de Wavelet (DWT) são avaliados sobre as imagens de documentos ao aplicar diferentes níveis de compressão sobre as mesmas, para cada técnica. Os experimentos são realizados sobre imagens digitais de documentos impressos e manuscritos de livros e periódicos, explorando texto escritos entre os séculos 16 ao século 19. Este material foi coletado na biblioteca Brasiliana Digital (www.brasiliana.usp.br), no Brasil. Resultados experimentais apontam que as medidas de atividade nos domínios espacial e espectral influenciam diretamente a fidelidade visual das imagens comprimidas para ambas as técnicas baseadas em DCT e DWT. Para uma taxa de compressão fixa de uma imagem comprimida em ambas técnicas, a presença de valores superiores de IAM e níveis menores de SAM na imagem de referência resultam em menor fidelidade visual, após a compressão. / Document images are digitized documents with textual content. These documents are composed of characters and their layout, with common characteristics among them, such as the presence of borders and boundaries in the shape of each character. The relationship between the characteristics of document images and the impact of the compression process with respect to visual fidelity are analyzed herein. Objective metrics are employed to analyze the characteristics of document images, such as the Image Activity Measure (IAM) in the spatial domain, and assessment of Spectral Activity Measure (SAM) in the spectral domain. The performance of image compression techniques based on Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are evaluated from document images by applying different compression levels for each technique to these images. The experiments are performed on digital images of printed documents and manuscripts of books and magazines, exploring texts written from the 16th to the 19th century. This material was collected in the Brasiliana Digital Library in Brazil. Experimental results show that the activity measures in spatial and spectral domains directly influence the visual fidelity of compressed images for both the techniques based on DCT and DWT. For a fixed compression ratio for both techniques on a compressed image, higher values of IAM and low levels of SAM in the reference image result in less visual fidelity after compression. Bibliotecas digitais Brasiliana-Biblioteca Digital Compressão de imagens Documentos digitalizados Imagens de documentos Qualidade da imagem Brasiliana- Digital Library Digital libraries Digitized documents Document images Image compression Image quality
3	Analysis Of Multi-lingual Documents With Complex Layout And Content Pati, Peeta Basa 11 1900 (has links) A document image, beside text, may contain pictures, graphs, signatures, logos, barcodes, hand-drawn sketches and/or seals. Further, the text blocks in an image may be in Manhattan or any complex layout. Document Layout Analysis is an important preprocessing step before subjecting any such image to OCR. Here, the image with complex layout and content is segmented into its constituent components. For many present day applications, separating the text from the non-text blocks is sufficient. This enables the conversion of the text elements present in the image to their corresponding editable form. In this work, an effort has been made to separate the text areas from the various kinds of possible non-text elements. The document images may have been obtained from a scanner or camera. If the source is a scanner, there is control on the scanning resolution, and lighting of the paper surface. Moreover, during the scanning process, the paper surface remains parallel to the sensor surface. However, when an image is obtained through a camera, these advantages are no longer available. Here, an algorithm is proposed to separate the text present in an image from the clutter, irrespective of the imaging technology used. This is achieved by using both the structural and textural information of the text present in the gray image. A bank of Gabor filters characterizes the statistical distribution of the text elements in the document. A connected component based technique removes certain types of non-text elements from the image. When a camera is used to acquire document images, generally, along with the structural and textural information of the text, color information is also obtained. It can be assumed that text present in an image has a certain amount of color homogeneity. So, a graph-theoretical color clustering scheme is employed to segment the iso-color components of the image. Each iso-color image is then analyzed separately for its structural and textural properties. The results of such analyses are merged with the information obtained from the gray component of the image. This helps to separate the colored text areas from the non-text elements. The proposed scheme is computationally intensive, because the separation of the text from non-text entities is performed at the pixel level Since any entity is represented by a connected set of pixels, it makes more sense to carry out the separation only at specific points, selected as representatives of their neighborhood. Harris' operator evaluates an edge-measure at each pixel and selects pixels, which are locally rich on this measure. These points are then employed for separating text from non-text elements. Many government documents and forms in India are bi-lingual or tri-lingual in nature. Further, in school text books, it is common to find English words interspersed within sentences in the main Indian language of the book. In such documents, successive words in a line of text may be of different scripts (languages). Hence, for OCR of these documents, the script must be recognized at the level of words, rather than lines or paragraphs. A database of about 20,000 words each from 11 Indian scripts1 is created. This is so far the largest database of Indian words collected and deployed for script recognition purpose. Here again, a bank of 36 Gabor filters is used to extract the feature vector which represents the script of the word. The effectiveness of Gabor features is compared with that of DCT and it is found that Gabor features marginally outperform the DOT. Simple, linear and non-linear classifiers are employed to classify the word in the feature space. It is assumed that a scheme developed to recognize the script of the words would work equally fine for sentences and paragraphs. This assumption has been verified with supporting results. A systematic study has been conducted to evaluate and compare the accuracy of various feature-classifier combinations for word script recognition. We have considered the cases of bi-script and tri-script documents, which are largely available. Average recognition accuracies for bi-script and tri-script cases are 98.4% and 98.2%, respectively. A hierarchical blind script recognizer, involving all eleven scripts has been developed and evaluated, which yields an average accuracy of 94.1%. The major contributions of the thesis are: • A graph theoretic color clustering scheme is used to segment colored text. • A scheme is proposed to separate text from the non-text content of documents with complex layout and content, captured by scanner or camera. • Computational complexity is reduced by performing the separation task on a selected set of locally edge-rich points. • Script identification at word level is carried out using different feature classifier combinations. Gabor features with SVM classifier outperforms any other feature-classifier combinations. A hierarchical blind script recognition algorithm, involving the recognition of 11 Indian scripts, is developed. This structure employs the most efficient feature-classifier combination at each individual nodal point of the tree to maximize the system performance. A sequential forward feature selection algorithm is employed to. select the most discriminating features, in a case by case basis, for script-recognition. The 11 scripts are Bengali, Devanagari, Gujarati, Kannada, Malayalam, Odiya, Puniabi, Roman. Tamil, Telugu and Urdu. Image Processing Phonetics - Image Processing Script Identification Script Recognition Algorithm Page Layout Analysis (PLA) Text Extraction Methods Gray Images Color Images Document Images Multi-script Documents Gabor Filter Multi-script Identification Bilingual Documents Applied Optics
4	Fast Registration of Tabular Document Images Using the Fourier-Mellin Transform Hutchison, Luke Alexander Daysh 24 March 2004 (has links) Image registration, the process of finding the transformation that best maps one image to another, is an important tool in document image processing. Having properly-aligned microfilm images can help in manual and automated content extraction, zoning, and batch compression of images. An image registration algorithm is presented that quickly identifies the global affine transformation (rotation, scale, translation and/or shear) that maps one tabular document image to another, using the Fourier-Mellin Transform. Each component of the affine transform is recovered independantly from the others, dramatically reducing the parameter space of the problem, and improving upon standard Fourier-Mellin Image Registration (FMIR), which only directly separates translation from the other components. FMIR is also extended to handle shear, as well as different scale factors for each document axis. This registration method deals with all transform components in a uniform way, by working in the frequency domain. Registration is limited to foreground pixels (the document form and printed text) through the introduction of a novel, locally adaptive foreground-background segmentation algorithm, based on the median filter. The background removal algorithm is also demonstrated as a useful tool to remove ambient signal noise during correlation. Common problems with FMIR are eliminated by background removal, meaning that apodization (tapering down to zero at the edge of the image) is not needed for accurate recovery of the rotation parameter, allowing the entire image to be used for registration. An effective new optimization to the median filter is presented. Rotation and scale parameter detection is less susceptible to problems arising from the non-commutativity of rotation and "tiling" (periodicity) than for standard FMIR, because only the regions of the frequency domain directly corresponding to tabular features are used in registration. An original method is also presented for automatically obtaining blank document templates from a set of registered document images, by computing the "pointwise median" of a set of registered documents. Finally, registration is demonstrated as an effective tool for predictive image compression. The presented registration algorithm is reliable and robust, and handles a wider range of transformation types than most document image registration systems (which typically only perform deskewing). document image registration deskewing transformation reversal Fourier-Mellin Transform Fourier transform rotation scale translation shear background removal thresholding locally-adaptive thresholding image segmentation microfilm processing tabular document images image processing family history technology Computer Sciences
5	Évaluation de la qualité des documents anciens numérisés Rabeux, Vincent 06 March 2013 (has links) Les travaux de recherche présentés dans ce manuscrit décrivent plusieurs apports au thème de l’évaluation de la qualité d’images de documents numérisés. Pour cela nous proposons de nouveaux descripteurs permettant de quantifier les dégradations les plus couramment rencontrées sur les images de documents numérisés. Nous proposons également une méthodologie s’appuyant sur le calcul de ces descripteurs et permettant de prédire les performances d’algorithmes de traitement et d’analyse d’images de documents. Les descripteurs sont définis en analysant l’influence des dégradations sur les performances de différents algorithmes, puis utilisés pour créer des modèles de prédiction à l’aide de régresseurs statistiques. La pertinence, des descripteurs proposés et de la méthodologie de prédiction, est validée de plusieurs façons. Premièrement, par la prédiction des performances de onze algorithmes de binarisation. Deuxièmement par la création d’un processus automatique de sélection de l’algorithme de binarisation le plus performant pour chaque image. Puis pour finir, par la prédiction des performances de deux OCRs en fonction de l’importance du défaut de transparence (diffusion de l’encre du recto sur le verso d’un document). Ce travail sur la prédiction des performances d’algorithmes est aussi l’occasion d’aborder les problèmes scientifiques liés à la création de vérités-terrains et d’évaluation de performances. / This PhD. thesis deals with quality evaluation of digitized document images. In order to measure the quality of a document image, we propose to create new features dedicated to the characterization of most commons degradations. We also propose to use these features to create prediction models able to predict the performances of different types of document analysis algorithms. The features are defined by analyzing the impact of a specific degradation on the results of an algorithm and then used to create statistical regressors.The relevance of the proposed features and predictions models, is analyzed in several experimentations. The first one aims to predict the performance of different binarization methods. The second experiment aims to create an automatic procedure able to select the best binarization method for each image. At last, the third experiment aims to create a prediction model for two commonly used OCRs. This work on performance prediction algorithms is also an opportunity to discuss the scientific problems of creating ground-truth for performance evaluation. Images de documents anciens Évaluation de la qualité Modèles de prédiction Descripteurs images Binarisation Reconnaissance de caractères Évaluation de performances Génération de documents synthétiques Création de vérité-terrains Régression linéaire Ancient document images Quality evaluation Image features Optical character recognition Performance evaluation Synthetic document image generation Ground-truth creation
6	Camera-Captured Document Image Analysis Kasar, Thotreingam 11 1900 (has links) (PDF) Text is no longer confined to scanned pages and often appears in camera-based images originating from text on real world objects. Unlike the images from conventional flatbed scanners, which have a controlled acquisition environment, camera-based images pose new challenges such as uneven illumination, blur, poor resolution, perspective distortion and 3D deformations that can severely affect the performance of any optical character recognition (OCR) system. Due to the variations in the imaging condition as well as the target document type, traditional OCR systems, designed for scanned images, cannot be directly applied to camera-captured images and a new level of processing needs to be addressed. In this thesis, we study some of the issues commonly encountered in camera-based image analysis and propose novel methods to overcome them. All the methods make use of color connected components. 1. Connected component descriptor for document image mosaicing Document image analysis often requires mosaicing when it is not possible to capture a large document at a reasonable resolution in a single exposure. Such a document is captured in parts and mosaicing stitches them into a single image. Since connected components (CCs) in a document image can easily be extracted regardless of the image rotation, scale and perspective distortion, we design a robust feature named connected component descriptor that is tailored for mosaicing camera-captured document images. The method involves extraction of a circular measurement region around each CC and its description using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a CC are augmented with those of its 2 nearest neighbors. Our method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to multiple instances of many characters in a document image. We illustrate the effectiveness of the proposed method on camera-captured document images exhibiting large variations in viewpoint, illumination and scale. 2. Font and background color independent text binarization The first step in an OCR system, after document acquisition, is binarization, which converts a gray-scale/color image into a two-level image -the foreground text and the background. We propose two methods for binarization of color documents whereby the foreground text is output as black and the background as white regardless of the polarity of foreground-background shades. (a) Hierarchical CC Analysis: The method employs an edge-based connected component approach and automatically determines a threshold for each component. It overcomes several limitations of existing locally-adaptive thresholding techniques. Firstly, it can handle documents with multi-colored texts with different background shades. Secondly, the method is applicable to documents having text of widely varying sizes, usually not handled by local binarization methods. Thirdly, the method automatically computes the threshold for binarization and the logic for inverting the output from the image data and does not require any input parameter. However, the method is sensitive to complex backgrounds since it relies on the edge information to identify CCs. It also uses script-specific characteristics to filter out edge components before binarization and currently works well for Roman script only. (b) Contour-based color clustering (COCOCLUST): To overcome the above limitations, we introduce a novel unsupervised color clustering approach that operates on a ‘small’ representative set of color pixels identified using the contour information. Based on the assumption that every character is of a uniform color, we analyze each color layer individually and identify potential text regions for binarization. Experiments on several complex images having large variations in font, size, color, orientation and script illustrate the robustness of the method. 3. Multi-script and multi-oriented text extraction from scene images Scene text understanding normally involves a pre-processing step of text detection and extraction before subjecting the acquired image for character recognition task. The subsequent recognition task is performed only on the detected text regions so as to mitigate the effect of background complexity. We propose a color-based CC labeling for robust text segmentation from natural scene images. Text CCs are identified using a combination of support vector machine and neural network classifiers trained on a set of low-level features derived from the boundary, stroke and gradient information. We develop a semiautomatic annotation toolkit to generate pixel-accurate groundtruth of 100 scenic images containing text in various layout styles and multiple scripts. The overall precision, recall and f-measure obtained on our dataset are 0.8, 0.86 and 0.83, respectively. The proposed method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset, which, however, has only horizontal English text. The overall precision, recall and f-measure obtained are 0.63, 0.59 and 0.61 respectively, which is comparable to the best performing methods in the ICDAR 2005 text locating competition. A recent method proposed by Epshtein et al. [1] achieves better results but it cannot handle arbitrarily oriented text. Our method, however, works well for generic scene images having arbitrary text orientations. 4. Alignment of curved text lines Conventional OCR systems perform poorly on document images that contain multi-oriented text lines. We propose a technique that first identifies individual text lines by grouping adjacent CCs based on their proximity and regularity. For each identified text string, a B-spline curve is fitted to the centroids of the constituent characters and normal vectors are computed along the fitted curve. Each character is then individually rotated such that the corresponding normal vector is aligned with the vertical axis. The method has been tested on a data set consisting of 50 images with text laid out in various ways namely along arcs, waves, triangles and a combination of these with linearly skewed text lines. It yields 95.9% recognition accuracy on text strings, where, before alignment, state-of-the-art OCRs fail to recognize any text. The CC-based pre-processing algorithms developed are well-suited for processing camera-captured images. We demonstrate the feasibility of the algorithms on the publicly-available ICDAR 2003 robust reading competition dataset and our own database comprising camera-captured document images that contain multiple scripts and arbitrary text layouts. Image Processing Document Image Mosaicing Color Text Binarization Camera-based Document Image Analysis Scene Images - Text Localization Images - Curved Text Strings - Alignment Connected Component Descriptor (CCD) Scenic Text Curved Character Strings OCR Readability Camera-based Images Camera-Captured Document Images Applied Optics

1

Page generated in 0.0768 seconds