Return to search

A Study of Graphically Chosen Features for Representation of TREC Topic-Document Sets

Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space.

The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and the null hypotheses were rejected.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc2456
Date05 1900
CreatorsOyarce, Guillermo Alfredo
ContributorsRorvig, Mark E., Young, Jon I., Turner, Philip M., 1948-, Totten, Herman L.
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
FormatText
RightsUse restricted to UNT Community, Copyright, Oyarce, Guillermo Alfredo, Copyright is held by the author, unless otherwise noted. All rights reserved.

Page generated in 0.0019 seconds