Spelling suggestions: "subject:"okada b25"" "subject:"okada bmp2""
1 |
Rocchio, Ide, Okapi och BIM : En komparativ studie av fyra metoder för relevance feedback / Rocchio, Ide, Okapi and BIM : A comparative study of four methods for relevance feedbackEriksen, Martin January 2008 (has links)
This thesis compares four relevance feedback methods. The Rocchio and Ide dec-hi algorithms for the vector space model and the binary independence model and Okapi BM25 within the probabilistic framework. This is done in a custom-made Information Retrieval system utilizing a collection containing 131 896 LA-Times articles which is part of the TREC ad-hoc collection. The methods are compared on two grounds, using only the relevance information from the 20 highest ranked documents from an initial search and also by using all available relevance information. Although a significant effect of choice of method could be found on the first ground, post-hoc analysis could not determine any statistically significant differences between the methods where Rocchio, Ide dec-hi and Okapi BM25 performed equivalent. All methods except the binary independence model performed significantly better than using no relevance feedback. It was also revealed that although the binary independence model performed far worse on average than the other methods it did outperform them on nearly 20 % of the topics. Further analysis argued that this depends on the lack of query expansion in the binary independence model which is advantageous for some topics although has a negative effect on retrieval efficiency in general. On the second ground Okapi BM25 performed significantly better than the other methods with the binary independence model once again being the worst performer. It was argued that the other methods have problems scaling to large amounts of relevance information where Okapi BM25 has no such issues. / Uppsatsnivå: D
|
2 |
Relevance Analysis for Document RetrievalLabouve, Eric 01 March 2019 (has links) (PDF)
Document retrieval systems recover documents from a dataset and order them according to their perceived relevance to a user’s search query. This is a difficult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user’s literal query and a user’s true intentions. Even with this ambiguity that arises with a lack of context, users still expect that the set of documents returned by a search engine is both highly relevant to their query and properly ordered. The focus of this thesis is on document retrieval systems that explore methods of ordering documents from unstructured, textual corpora using text queries. The main goal of this study is to enhance the Okapi BM25 document retrieval model. In doing so, this research hypothesizes that the structure of text inside documents and queries hold valuable semantic information that can be incorporated into the Okapi BM25 model to increase its performance. Modifications that account for a term’s part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25 to increase the model’s performance. The study resulted in 87 modifications which were all validated using open source corpora. The top scoring modification from the validation phase was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision. When compared against two industry standard search engines, Lucene and Solr, the top scoring modification largely outperforms these systems by upwards to 21.78% and 23.01%, respectively.
|
Page generated in 0.0688 seconds