Return to search

A Common Representation Format for Multimedia Documents

Multimedia documents are composed of multiple file format combinations, such as image and text, image and sound, or image, text and sound. The type of multimedia document determines the form of analysis for knowledge architecture design and retrieval methods. Over the last few decades, theories of text analysis have been proposed and applied effectively. In recent years, theories of image and sound analysis have been proposed to work with text retrieval systems and progressed quickly due in part to rapid progress in computer processing speed. Retrieval of multimedia documents formerly was divided into the categories of image and text, and image and sound. While standard retrieval process begins from text only, methods are developing that allow the retrieval process to be accomplished simultaneously using text and image. Although image processing for feature extraction and text processing for term extractions are well understood, there are no prior methods that can combine these two features into a single data structure. This dissertation will introduce a common representation format for multimedia documents (CRFMD) composed of both images and text. For image and text analysis, two techniques are used: the Lorenz Information Measurement and the Word Code. A new process named Jeong's Transform is demonstrated for extraction of text and image features, combining the two previous measurements to form a single data structure. Finally, this single data measurements to form a single data structure. Finally, this single data structure is analyzed by using multi-dimensional scaling. This allows multimedia objects to be represented on a two-dimensional graph as vectors. The distance between vectors represents the magnitude of the difference between multimedia documents. This study shows that image classification on a given test set is dramatically improved when text features are encoded together with image features. This effect appears to hold true even when the available text is diffused and is not uniform with the image features. This retrieval system works by representing a multimedia document as a single data structure. CRFMD is applicable to other areas of multimedia document retrieval and processing, such as medical image retrieval, World Wide Web searching, and museum collection retrieval.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc3336
Date12 1900
CreatorsJeong, Ki Tai
ContributorsRorvig, Mark E., O'Connor, Brian Clark, Ji, Minhe, Hastings, Samantha Kelly
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
FormatText
RightsPublic, Copyright, Jeong, Ki Tai, Copyright is held by the author, unless otherwise noted. All rights reserved.

Page generated in 0.0021 seconds