The widely used vector model maintains its popularity because of its simplicity, fast speed, and the appeal of using spatial proximity for semantic proximity. However, this model faces a disadvantage that is associated with the vagueness from keywords overlapping. Efforts have been made to improve the vector model. The research on improving document representation has been focused on four areas, namely, statistical co-occurrence of related items, forming term phrases, grouping of related words, and representing the content of documents. In this thesis, we propose the idea-indexing model to improve document representation for the filtering task in IR. The idea-indexing model matches document terms with the ideas they express and indexes the document with these ideas. This indexing scheme represents the document with its semantics instead of sets of independent terms. We show in this thesis that indexing with ideas leads to better performance.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc4275 |
Date | 08 1900 |
Creators | Yang, Li |
Contributors | Mihalcea, Rada, 1974-, Swigger, Kathleen M., Brazile, Robert |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | Text |
Rights | Public, Copyright, Yang, Li, Copyright is held by the author, unless otherwise noted. All rights reserved. |
Page generated in 0.0019 seconds