Global ETD Search

Return to search

An Analysis of Document Retrieval and Clustering Using an Effective Semantic Distance Measure

As large amounts of digital information become more and more accessible, the ability to effectively find relevant information is increasingly important. Search engines have historically performed well at finding relevant information by relying primarily on lexical and word based measures. Similarly, standard approaches to organizing and categorizing large amounts of textual information have previously relied on lexical and word based measures to perform grouping or classification tasks. Quite often, however, these processes take place without respect to semantics, or word meanings. This is perhaps due to the fact that the idea of meaningful similarity is naturally qualitative, and thus difficult to incorporate into quantitative processes. In this thesis we formally present a method for computing quantitative document-level semantic distance, which is designed to model the degree to which humans would associate two documents with respect to conceptual similarity. We show how this metric can be applied to document retrieval and clustering problems. We conclude that while our metric is not well suited for text indexing, the use of our semantic distance metric can improve document retrieval through result set re-ranking and query expansion. We also conclude that our semantic distance metric can be used to improve document clustering in distance-based clustering algorithms.

computational linguistics

semantics

computer science

document clustering

information retrieval

Computer Sciences

Identifer	oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-2599
Date	21 November 2008
Creators	Davis, Nathan Scott
Publisher	BYU ScholarsArchive
Source Sets	Brigham Young University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	http://lib.byu.edu/about/copyright/

Page generated in 0.0016 seconds

An Analysis of Document Retrieval and Clustering Using an Effective Semantic Distance Measure

Description

Links & Downloads

Tags

Additional Fields