Spelling suggestions: "subject:"text tracking"" "subject:"next tracking""
1 |
Extraction of Text Objects in Image and Video DocumentsZhang, Jing 01 January 2012 (has links)
The popularity of digital image and video is increasing rapidly. To help users navigate libraries of image and video, Content Based Information Retrieval (CBIR) system that can automatically index image and video documents are needed. However, due to the semantic gap between low-level machine descriptors and high-level semantic descriptors, the existing CBIR systems are still far from perfect. Text embedded in multi-media data, as a well-defined model of concepts for humans' communication, contains much semantic information related to the content. This text information can provide a much truer form of content-based access to the image and video documents if it can be extracted and harnessed efficiently.
This dissertation solves the problem involved in detecting text object in image and video and tracking text event in video. For text detection problem, we propose a new unsupervised text detection algorithm. A new text model is constructed to describe text object using pictorial structure. Each character is a part in the model and every two neighboring characters are connected by a spring-like link. Two characters and the link connecting them are defined as a text unit. We localize candidate parts by extracting closed boundaries and initialize the links by connecting two neighboring candidate parts based on the spatial relationship of characters. For every candidate part, we compute character energy using three new character features, averaged angle difference of corresponding pairs, fraction of non-noise pairs, and vector of stroke width. They are extracted based on our observation that the edge of a character can be divided into two sets with high similarities in length, curvature, and orientation. For every candidate link, we compute link energy based on our observation that the characters of a text typically align along certain direction with similar color, size, and stroke width. For every candidate text unit, we combine character and link energies to compute text unit energy which indicates the probability that the candidate text model is a real text object. The final text detection results are generated using a text unit energy based thresholding. For text tracking problem, we construct a text event model by using pictorial structure as well. In this model, the detected text object in each video frame is a part and two neighboring text objects of a text event are connected by a spring-like link. Inter-frame link energy is computed for each link based on the character energy, similarity of neighboring text objects, and motion information. After refining the model using inter-frame link energy, the remaining text event models are marked as text events.
At character level, because the proposed method is based on the assumption that the strokes of a character have uniform thickness, it can detect and localize characters from different languages in different styles, such as typewritten text or handwriting text, if the characters have approximately uniform stroke thickness. At text level, however, because the spatial relationship between two neighboring characters is used to localize text objects, the proposed method may fail to detect and localize the characters with multiple separate strokes or connected characters. For example, some East Asian language characters, such as Chinese, Japanese, and Korean, have many strokes of a single character. We need to group the strokes first to form single characters and then group characters to form text objects. While, the characters of some languages, such Arabic and Hindi, are connected together, we cannot extract spatial information between neighboring characters since they are detected as a single character. Therefore, in current stage the proposed method can detect and localize the text objects that are composed of separate characters with connected strokes with approximately uniform thickness.
We evaluated our method comprehensively using three English language-based image and video datasets: ICDAR 2003/2005 text locating dataset (258 training images and 251 test images), Microsoft Street View text detection dataset (307 street view images), and VACE video dataset (50 broadcast news videos from CNN and ABC). The experimental results demonstrate that the proposed text detection method can capture the inherent properties of text and discriminate text from other objects efficiently.
|
2 |
Reconhecimento de texto e rastreamento de objetos 2D/3D / Text recognition and 2D/3D object trackingMinetto, Rodrigo, 1983- 20 August 2018 (has links)
Orientadores: Jorge Stolfi, Neucimar Jerônimo Leite / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-20T03:12:07Z (GMT). No. of bitstreams: 1
Minetto_Rodrigo_D.pdf: 35894128 bytes, checksum: 8a0e453fba7e6a9a02fb17a52fdbf878 (MD5)
Previous issue date: 2012 / Resumo: Nesta tese abordamos três problemas de visão computacional: (1) detecção e reconhecimento de objetos de texto planos em imagens de cenas reais; (2) rastreamento destes objetos de texto em vídeos digitais; e (3) o rastreamento de um objeto tridimensional rígido arbitrário com marcas conhecidas em um vídeo digital. Nós desenvolvemos, para cada um dos problemas, algoritmos inovadores, que são pelo menos tão precisos e robustos quanto outros algoritmos estado-da-arte. Especificamente, para reconhecimento de texto nós desenvolvemos (e validamos extensivamente) um novo descritor de imagem baseado em HOG especializado para escrita romana, que denominamos T-HOG, e mostramos sua contribuição como um filtro em um detector de texto (SNOOPERTEXT). Nós também melhoramos o algoritmo SNOOPERTEXT através do uso da técnica multiescala para tratar caracteres de tamanhos bastante variados e limitar a sensibilidade do algoritmo a vários artefatos. Para rastreamento de texto, nós descrevemos quatro estratégias básicas para combinar a detecção e o rastreamento de texto, e desenvolvemos também um rastreador específico baseado em filtro de partículas que explora o uso do reconhecedor T-HOG. Para o rastreamento de objetos rígidos, nós desenvolvemos um novo algoritmo preciso e robusto (AFFTRACK) que combina rastreamento de características por KLT com uma calibração de câmera melhorada. Nós testamos extensivamente nossos algoritmos com diversas bases de dados descritas na literatura. Nós também desenvolvemos algumas bases de dados (publicamente disponíveis) para a validação de algoritmos de detecção e rastreamento de texto e de rastreamento de objetos rígidos em vídeos / Abstract: In this thesis we address three computer vision problems: (1) the detection and recognition of flat text objects in images of real scenes; (2) the tracking of such text objects in a digital video; and (3) the tracking an arbitrary three-dimensional rigid object with known markings in a digital video. For each problem we developed innovative algorithms, which are at least as accurate and robust as other state-of-the-art algorithms. Specifically, for text classification we developed (and extensively evaluated) a new HOG-based descriptor specialized for Roman script, which we call T-HOG, and showed its value as a post-filter for an existing text detector (SNOOPERTEXT). We also improved the SNOOPERTEXT algorithm by using the multi-scale technique to handle widely different letter sizes while limiting the sensitivity of the algorithm to various artifacts. For text tracking, we describe four basic ways of combining a text detector and a text tracker, and we developed a specific tracker based on a particle-filter which exploits the T-HOG recognizer. For rigid object tracking we developed a new accurate and robust algorithm (AFFTRACK) that combines the KLT feature tracker with an improved camera calibration procedure. We extensively tested our algorithms on several benchmarks well-known in the literature. We also created benchmarks (publicly available) for the evaluation of text detection and tracking and rigid object tracking algorithms / Doutorado / Ciência da Computação / Doutor em Ciência da Computação
|
Page generated in 0.1752 seconds