Spelling suggestions: "subject:"tem proximity""
1 |
Measuring the Stability of Query Term Collocations and Using it in Document RankingAlshaar, Rana January 2008 (has links)
Delivering the right information to the user is fundamental in information retrieval system. Many traditional information retrieval models assume word independence and view a document as bag-of-words, however getting the right information requires a deep understanding of the content of the document and the relationships that exist between words in the text.
This study focuses on developing two new document ranking techniques, which are based on a lexical cohesive relationship of collocation. Collocation relationship is a semantic relationship that exists between words that co-occur in the same lexical environment. Two types of collocation relationship have been considered; collocation in the same grammatical structure (such as a sentence), and collocation in the same semantic structure where query terms occur in different sentences but they co-occur with the same words.
In the first technique, we only considered the first type of collocation to calculate the document score; where the positional frequency of query terms co-occurrence have been used to identify collocation relationship between query terms and calculating query term’s weight.
In the second technique, both types of collocation have been considered; where the co-occurrence frequency distribution within a predefined window has been used to determine query terms collocations and computing query term’s weight. Evaluation of the proposed techniques show performance gain in some of the collocations over the chosen baseline runs.
|
2 |
Measuring the Stability of Query Term Collocations and Using it in Document RankingAlshaar, Rana January 2008 (has links)
Delivering the right information to the user is fundamental in information retrieval system. Many traditional information retrieval models assume word independence and view a document as bag-of-words, however getting the right information requires a deep understanding of the content of the document and the relationships that exist between words in the text.
This study focuses on developing two new document ranking techniques, which are based on a lexical cohesive relationship of collocation. Collocation relationship is a semantic relationship that exists between words that co-occur in the same lexical environment. Two types of collocation relationship have been considered; collocation in the same grammatical structure (such as a sentence), and collocation in the same semantic structure where query terms occur in different sentences but they co-occur with the same words.
In the first technique, we only considered the first type of collocation to calculate the document score; where the positional frequency of query terms co-occurrence have been used to identify collocation relationship between query terms and calculating query term’s weight.
In the second technique, both types of collocation have been considered; where the co-occurrence frequency distribution within a predefined window has been used to determine query terms collocations and computing query term’s weight. Evaluation of the proposed techniques show performance gain in some of the collocations over the chosen baseline runs.
|
3 |
Relevance Analysis for Document RetrievalLabouve, Eric 01 March 2019 (has links) (PDF)
Document retrieval systems recover documents from a dataset and order them according to their perceived relevance to a user’s search query. This is a difficult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user’s literal query and a user’s true intentions. Even with this ambiguity that arises with a lack of context, users still expect that the set of documents returned by a search engine is both highly relevant to their query and properly ordered. The focus of this thesis is on document retrieval systems that explore methods of ordering documents from unstructured, textual corpora using text queries. The main goal of this study is to enhance the Okapi BM25 document retrieval model. In doing so, this research hypothesizes that the structure of text inside documents and queries hold valuable semantic information that can be incorporated into the Okapi BM25 model to increase its performance. Modifications that account for a term’s part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25 to increase the model’s performance. The study resulted in 87 modifications which were all validated using open source corpora. The top scoring modification from the validation phase was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision. When compared against two industry standard search engines, Lucene and Solr, the top scoring modification largely outperforms these systems by upwards to 21.78% and 23.01%, respectively.
|
Page generated in 0.0675 seconds