Global ETD Search

Return to search

Information retrieval via universal source coding

This dissertation explores the intersection of information retrieval and universal source coding techniques and studies an optimal multidimensional source representation from an information theoretic point of view. Previous research on information retrieval particularly focus on learning probabilistic or deterministic source models based on primarily two different types of source representations, e.g., fixed-shape partitions or uniform regions. We study the limitations of the conventional source representations on capturing the semantics of the given multidimensional source sequences and propose a new type of primitive source representation generated by a universal source coding technique. We propose a multidimensional incremental parsing algorithm extended from the Lempel-Ziv incremental parsing and its three component schemes for multidimensional source coding. The properties of the proposed coding algorithm are exploited under two-dimensional lossless and lossy source coding. By the proposed coding algorithm, a given multidimensional source sequence is parsed into a number of variable-size patches. We call this methodology a parsed representation.

Based on the source representation, we propose an information retrieval framework that analyzes a set of source sequences under a linguistic processing technique and implemented content-based image retrieval systems. We examine the relevance of the proposed source representation by comparing it with the conventional representation of visual information. To further extend the proposed framework, we apply a probabilistic linguistic processing technique to modeling the latent aspects of a set of documents. In addition, beyond the symbol-wise pattern matching paradigm employed in the source coding and the image retrieval systems, we devise a robust pattern matching that compares the first- and second-order statistics of source patches. Qualitative and quantitative analysis of the proposed framework justifies the superiority of the proposed information retrieval framework based on the parsed representation. The proposed
source representation technique and the information retrieval frameworks encourage future work in exploiting a systematic way of understanding multidimensional sources that parallels a linguistic structure.

http://hdl.handle.net/1853/26573

Image retrieval

Image compression

Information retrieval

Machine learning

Universal source coding

Data compression

Incremental parsing

Pattern recognition

Natural language processing

Pattern recognition systems

Pattern perception

Computer science

Multidimensional databases

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/26573
Date	17 November 2008
Creators	Bae, Soo Hyun
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Detected Language	English
Type	Dissertation

Page generated in 0.0021 seconds

Information retrieval via universal source coding

Description

Links & Downloads

Tags

Additional Fields