Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space.
The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and the null hypotheses were rejected.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc2456 |
Date | 05 1900 |
Creators | Oyarce, Guillermo Alfredo |
Contributors | Rorvig, Mark E., Young, Jon I., Turner, Philip M., 1948-, Totten, Herman L. |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | Text |
Rights | Use restricted to UNT Community, Copyright, Oyarce, Guillermo Alfredo, Copyright is held by the author, unless otherwise noted. All rights reserved. |
Page generated in 0.0019 seconds