• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • 1
  • Tagged with
  • 7
  • 7
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Extraction of Text Objects in Image and Video Documents

Zhang, Jing 01 January 2012 (has links)
The popularity of digital image and video is increasing rapidly. To help users navigate libraries of image and video, Content Based Information Retrieval (CBIR) system that can automatically index image and video documents are needed. However, due to the semantic gap between low-level machine descriptors and high-level semantic descriptors, the existing CBIR systems are still far from perfect. Text embedded in multi-media data, as a well-defined model of concepts for humans' communication, contains much semantic information related to the content. This text information can provide a much truer form of content-based access to the image and video documents if it can be extracted and harnessed efficiently. This dissertation solves the problem involved in detecting text object in image and video and tracking text event in video. For text detection problem, we propose a new unsupervised text detection algorithm. A new text model is constructed to describe text object using pictorial structure. Each character is a part in the model and every two neighboring characters are connected by a spring-like link. Two characters and the link connecting them are defined as a text unit. We localize candidate parts by extracting closed boundaries and initialize the links by connecting two neighboring candidate parts based on the spatial relationship of characters. For every candidate part, we compute character energy using three new character features, averaged angle difference of corresponding pairs, fraction of non-noise pairs, and vector of stroke width. They are extracted based on our observation that the edge of a character can be divided into two sets with high similarities in length, curvature, and orientation. For every candidate link, we compute link energy based on our observation that the characters of a text typically align along certain direction with similar color, size, and stroke width. For every candidate text unit, we combine character and link energies to compute text unit energy which indicates the probability that the candidate text model is a real text object. The final text detection results are generated using a text unit energy based thresholding. For text tracking problem, we construct a text event model by using pictorial structure as well. In this model, the detected text object in each video frame is a part and two neighboring text objects of a text event are connected by a spring-like link. Inter-frame link energy is computed for each link based on the character energy, similarity of neighboring text objects, and motion information. After refining the model using inter-frame link energy, the remaining text event models are marked as text events. At character level, because the proposed method is based on the assumption that the strokes of a character have uniform thickness, it can detect and localize characters from different languages in different styles, such as typewritten text or handwriting text, if the characters have approximately uniform stroke thickness. At text level, however, because the spatial relationship between two neighboring characters is used to localize text objects, the proposed method may fail to detect and localize the characters with multiple separate strokes or connected characters. For example, some East Asian language characters, such as Chinese, Japanese, and Korean, have many strokes of a single character. We need to group the strokes first to form single characters and then group characters to form text objects. While, the characters of some languages, such Arabic and Hindi, are connected together, we cannot extract spatial information between neighboring characters since they are detected as a single character. Therefore, in current stage the proposed method can detect and localize the text objects that are composed of separate characters with connected strokes with approximately uniform thickness. We evaluated our method comprehensively using three English language-based image and video datasets: ICDAR 2003/2005 text locating dataset (258 training images and 251 test images), Microsoft Street View text detection dataset (307 street view images), and VACE video dataset (50 broadcast news videos from CNN and ABC). The experimental results demonstrate that the proposed text detection method can capture the inherent properties of text and discriminate text from other objects efficiently.
2

Modul do serverové aplikace pro rozpoznávání identifikačních údajů z osobních dokladů

BARTYZAL, Miroslav January 2018 (has links)
This Master's thesis deals with the creation of a server-side system used for the automated reading of personal information from photographed identity documents. It is focused on the processing of photographs made by camera phones with respect to various quality of their images. Text localization in images and its recognition by means of neural network are the subject of this thesis. The final system is tested by the client application which was created for the Android operating system.
3

Detection of Frozen Video Subtitles Using Machine Learning

Sjölund, Jonathan January 2019 (has links)
When subtitles are burned into a video, an error can sometimes occur in the encoder that results in the same subtitle being burned into several frames, resulting in subtitles becoming frozen. This thesis provides a way to detect frozen video subtitles with the help of an implemented text detector and classifier. Two types of classifiers, naïve classifiers and machine learning classifiers, are tested and compared on a variety of different videos to see how much a machine learning approach can improve the performance. The naïve classifiers are evaluated using ground truth data to gain an understanding of the importance of good text detection. To understand the difficulty of the problem, two different machine learning classifiers are tested, logistic regression and random forests. The result shows that machine learning improves the performance over using naïve classifiers by improving the specificity from approximately 87.3% to 95.8% and improving the accuracy from 93.3% to 95.5%. Random forests achieve the best overall performance, but the difference compared to when using logistic regression is small enough that more computationally complex machine learning classifiers are not necessary. Using the ground truth shows that the weaker naïve classifiers would be improved by at least 4.2% accuracy, thus a better text detector is warranted. This thesis shows that machine learning is a viable option for detecting frozen video subtitles.
4

Localization And Recognition Of Text In Digital Media

Saracoglu, Ahmet 01 November 2007 (has links) (PDF)
Textual information within digital media can be used in many areas such as, indexing and structuring of media databases, in the aid of visually impaired, translation of foreign signs and many more. This said, mainly text can be separated into two categories in digital media as, overlay-text and scene-text. In this thesis localization and recognition of video text regardless of its category in digital media is investigated. As a necessary first step, framework of a complete system is discussed. Next, a comparative analysis of feature vector and classification method pairs is presented. Furthermore, multi-part nature of text is exploited by proposing a novel Markov Random Field approach for the classification of text/non-text regions. Additionally, better localization of text is achieved by introducing bounding-box extraction method. And for the recognition of text regions, a handprint based Optical Character Recognition system is thoroughly investigated. During the investigation of text recognition, multi-hypothesis approach for the segmentation of background is proposed by incorporating k-Means clustering. Furthermore, a novel dictionary-based ranking mechanism is proposed for recognition spelling correction. And overall system is simulated on a challenging data set. Also, a through survey on scene-text localization and recognition is presented. Furthermore, challenges are identified and discussed by providing related work on them. Scene-text localization simulations on a public competition data set are also provided. Lastly, in order to improve recognition performance of scene-text on signs that are affected from perspective projection distortion, a rectification method is proposed and simulated.
5

Text Localization for Unmanned Ground Vehicles

Kirchhoff, Allan Richard 16 October 2014 (has links)
Unmanned ground vehicles (UGVs) are increasingly being used for civilian and military applications. Passive sensing, such as visible cameras, are being used for navigation and object detection. An additional object of interest in many environments is text. Text information can supplement the autonomy of unmanned ground vehicles. Text most often appears in the environment in the form of road signs and storefront signs. Road hazard information, unmapped route detours and traffic information are available to human drivers through road signs. Premade road maps lack these traffic details, but with text localization the vehicle could fill the information gaps. Leading text localization algorithms achieve ~60% accuracy; however, practical applications are cited to require at least 80% accuracy [49]. The goal of this thesis is to test existing text localization algorithms against challenging scenes, identify the best candidate and optimize it for scenes a UGV would encounter. Promising text localization methods were tested against a custom dataset created to best represent scenes a UGV would encounter. The dataset includes road signs and storefront signs against complex background. The methods tested were adaptive thresholding, the stroke filter and the stroke width transform. A temporal tracking proof of concept was also tested. It tracked text through a series of frames in order to reduce false positives. Best results were obtained using the stroke width transform with temporal tracking which achieved an accuracy of 79%. That level of performance approaches requirements for use in practical applications. Without temporal tracking the stroke width transform yielded an accuracy of 46%. The runtime was 8.9 seconds per image, which is 44.5 times slower than necessary for real-time object tracking. Converting the MATLAB code to C++ and running the text localization on a GPU could provide the necessary speedup. / Master of Science
6

Segmentation Strategies for Scene Word Images

Anil Prasad, M N January 2014 (has links) (PDF)
No description available.
7

Camera-Captured Document Image Analysis

Kasar, Thotreingam 11 1900 (has links) (PDF)
Text is no longer confined to scanned pages and often appears in camera-based images originating from text on real world objects. Unlike the images from conventional flatbed scanners, which have a controlled acquisition environment, camera-based images pose new challenges such as uneven illumination, blur, poor resolution, perspective distortion and 3D deformations that can severely affect the performance of any optical character recognition (OCR) system. Due to the variations in the imaging condition as well as the target document type, traditional OCR systems, designed for scanned images, cannot be directly applied to camera-captured images and a new level of processing needs to be addressed. In this thesis, we study some of the issues commonly encountered in camera-based image analysis and propose novel methods to overcome them. All the methods make use of color connected components. 1. Connected component descriptor for document image mosaicing Document image analysis often requires mosaicing when it is not possible to capture a large document at a reasonable resolution in a single exposure. Such a document is captured in parts and mosaicing stitches them into a single image. Since connected components (CCs) in a document image can easily be extracted regardless of the image rotation, scale and perspective distortion, we design a robust feature named connected component descriptor that is tailored for mosaicing camera-captured document images. The method involves extraction of a circular measurement region around each CC and its description using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a CC are augmented with those of its 2 nearest neighbors. Our method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to multiple instances of many characters in a document image. We illustrate the effectiveness of the proposed method on camera-captured document images exhibiting large variations in viewpoint, illumination and scale. 2. Font and background color independent text binarization The first step in an OCR system, after document acquisition, is binarization, which converts a gray-scale/color image into a two-level image -the foreground text and the background. We propose two methods for binarization of color documents whereby the foreground text is output as black and the background as white regardless of the polarity of foreground-background shades. (a) Hierarchical CC Analysis: The method employs an edge-based connected component approach and automatically determines a threshold for each component. It overcomes several limitations of existing locally-adaptive thresholding techniques. Firstly, it can handle documents with multi-colored texts with different background shades. Secondly, the method is applicable to documents having text of widely varying sizes, usually not handled by local binarization methods. Thirdly, the method automatically computes the threshold for binarization and the logic for inverting the output from the image data and does not require any input parameter. However, the method is sensitive to complex backgrounds since it relies on the edge information to identify CCs. It also uses script-specific characteristics to filter out edge components before binarization and currently works well for Roman script only. (b) Contour-based color clustering (COCOCLUST): To overcome the above limitations, we introduce a novel unsupervised color clustering approach that operates on a ‘small’ representative set of color pixels identified using the contour information. Based on the assumption that every character is of a uniform color, we analyze each color layer individually and identify potential text regions for binarization. Experiments on several complex images having large variations in font, size, color, orientation and script illustrate the robustness of the method. 3. Multi-script and multi-oriented text extraction from scene images Scene text understanding normally involves a pre-processing step of text detection and extraction before subjecting the acquired image for character recognition task. The subsequent recognition task is performed only on the detected text regions so as to mitigate the effect of background complexity. We propose a color-based CC labeling for robust text segmentation from natural scene images. Text CCs are identified using a combination of support vector machine and neural network classifiers trained on a set of low-level features derived from the boundary, stroke and gradient information. We develop a semiautomatic annotation toolkit to generate pixel-accurate groundtruth of 100 scenic images containing text in various layout styles and multiple scripts. The overall precision, recall and f-measure obtained on our dataset are 0.8, 0.86 and 0.83, respectively. The proposed method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset, which, however, has only horizontal English text. The overall precision, recall and f-measure obtained are 0.63, 0.59 and 0.61 respectively, which is comparable to the best performing methods in the ICDAR 2005 text locating competition. A recent method proposed by Epshtein et al. [1] achieves better results but it cannot handle arbitrarily oriented text. Our method, however, works well for generic scene images having arbitrary text orientations. 4. Alignment of curved text lines Conventional OCR systems perform poorly on document images that contain multi-oriented text lines. We propose a technique that first identifies individual text lines by grouping adjacent CCs based on their proximity and regularity. For each identified text string, a B-spline curve is fitted to the centroids of the constituent characters and normal vectors are computed along the fitted curve. Each character is then individually rotated such that the corresponding normal vector is aligned with the vertical axis. The method has been tested on a data set consisting of 50 images with text laid out in various ways namely along arcs, waves, triangles and a combination of these with linearly skewed text lines. It yields 95.9% recognition accuracy on text strings, where, before alignment, state-of-the-art OCRs fail to recognize any text. The CC-based pre-processing algorithms developed are well-suited for processing camera-captured images. We demonstrate the feasibility of the algorithms on the publicly-available ICDAR 2003 robust reading competition dataset and our own database comprising camera-captured document images that contain multiple scripts and arbitrary text layouts.

Page generated in 0.133 seconds