Return to search

An Analysis of Document Retrieval and Clustering Using an Effective Semantic Distance Measure

As large amounts of digital information become more and more accessible, the ability to effectively find relevant information is increasingly important. Search engines have historically performed well at finding relevant information by relying primarily on lexical and word based measures. Similarly, standard approaches to organizing and categorizing large amounts of textual information have previously relied on lexical and word based measures to perform grouping or classification tasks. Quite often, however, these processes take place without respect to semantics, or word meanings. This is perhaps due to the fact that the idea of meaningful similarity is naturally qualitative, and thus difficult to incorporate into quantitative processes. In this thesis we formally present a method for computing quantitative document-level semantic distance, which is designed to model the degree to which humans would associate two documents with respect to conceptual similarity. We show how this metric can be applied to document retrieval and clustering problems. We conclude that while our metric is not well suited for text indexing, the use of our semantic distance metric can improve document retrieval through result set re-ranking and query expansion. We also conclude that our semantic distance metric can be used to improve document clustering in distance-based clustering algorithms.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-2599
Date21 November 2008
CreatorsDavis, Nathan Scott
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttp://lib.byu.edu/about/copyright/

Page generated in 0.0016 seconds