Term weighting is a process of scoring and ranking a term’s relevance to a user’s information need or the importance of a term to a document. This thesis aims to investigate novel term weighting methods with applications in document representation for text classification, web document ranking, and ranked query suggestion. Firstly, this research proposes a new feature for document representation under the vector space model (VSM) framework, i.e., class specific document frequency (CSDF), which leads to a new term weighting scheme based on term frequency (TF) and the newly proposed feature. The experimental results show that the proposed methods, CSDF and TF-CSDF, improve the performance of document classification in comparison with other widely used VSM document representations. Secondly, a new ranking method called GCrank is proposed for re-ranking web documents returned from search engines using document classification scores. The experimental results show that the GCrank method can improve the performance of web returned document ranking in terms of several commonly used evaluation criteria. Finally, this research investigates several state-of-the-art ranked retrieval methods, adapts and combines them as well, leading to a new method called Tfjac for ranked query suggestion, which is based on the combination between TF-IDF and Jaccard coefficient methods. The experimental results show that Tfjac is the best method for query suggestion among the methods evaluated. It outperforms the most popularly used TF-IDF method in terms of increasing the number of highly relevant query suggestions.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:712552 |
Date | January 2017 |
Creators | Plansangket, Suthira |
Publisher | University of Essex |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://repository.essex.ac.uk/19456/ |
Page generated in 0.0018 seconds