Cluster-based information retrieval systems often use a similarity measure to compute the
association among text documents. In this thesis, we focus on a class of similarity
measures named Query-Sensitive Similarity (QSS) measures. Recent studies have shown
QSS measures to positively influence the outcome of a clustering procedure. These
studies have used QSS measures in conjunction with the ltc term-weighting scheme.
Several term-weighting schemes have superseded the ltc term-weighing scheme and
demonstrated better retrieval performance relative to the latter. We test whether
introducing one of these schemes, INQUERY, will offer any benefit over the ltc scheme
when used in the context of QSS measures. The testing procedure uses the Nearest
Neighbor (NN) test to quantify the clustering effectiveness of QSS measures and the
corresponding term-weighting scheme.
The NN tests are applied on certain standard test document collections and the results are
tested for statistical significance. On analyzing results of the NN test relative to those
obtained for the ltc scheme, we find several instances where the INQUERY scheme
improves the clustering effectiveness of QSS measures. To be able to apply the NN test,
we designed a software test framework, Ferret, by complementing the features provided
by dtSearch, a search engine. The test framework automates the generation of NN
coefficients by processing standard test document collection data. We provide an insight
into the construction and working of the Ferret test framework.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/3116 |
Date | 12 April 2006 |
Creators | Kini, Ananth Ullal |
Contributors | Nelson, Paul |
Publisher | Texas A&M University |
Source Sets | Texas A and M University |
Language | en_US |
Detected Language | English |
Type | Book, Thesis, Electronic Thesis, text |
Format | 286564 bytes, electronic, application/pdf, born digital |
Page generated in 0.002 seconds