A term weighting scheme measures the importance of a term in a collection. A document ranking model uses these term weights to find the rank or score of a document in a collection. We present a series of cluster-based term weighting and document ranking models based on the TF-IDF and Okapi BM25 models. These term weighting and document ranking models update the inter-cluster and intra-cluster frequency components based on the generated clusters. These inter-cluster and intra-cluster frequency components are used for weighting the importance of a term in addition to the term and document frequency components. In this thesis, we will show how these models outperform the TF-IDF and Okapi BM25 models in document clustering and ranking.
Identifer | oai:union.ndltd.org:uky.edu/oai:uknowledge.uky.edu:gradschool_theses-1164 |
Date | 01 January 2011 |
Creators | Murugesan, Keerthiram |
Publisher | UKnowledge |
Source Sets | University of Kentucky |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | University of Kentucky Master's Theses |
Page generated in 0.0013 seconds