Meta Search Engines are finding tools developed for enhancing the search performance by submitting user queries to multiple search
engines and combining the search results in a unified ranked list. They utilized data fusion technique, which requires three major steps: databases selection, the results combination, and the results merging.
This study tries to build a framework that can be used for merging the search results retrieved from any set of search engines. This framework based on answering three major questions:
1.How meta-search developers could define the optimal rank order for the selected engines.
2. How meta-search developers could choose the best search engines combination.
3. What is the optimal heuristic merging function that could be used for aggregating the rank order of the retrieved documents form incomparable search engines.
The main data collection process depends on
running 40 general queries on three major search engines (Google, AltaVista, and Alltheweb). Real users have involved in the relevance judgment process for a five point relevancy scale. The
performance of the three search engines, their different combinations and different merging algorithm have been compared to rank the database, choose the best combination and define the optimal merging function.
The major findings of this study are (1) Ranking the databases in merging process should depends on their overall performance not their popularity or size; (2)Larger databases tend to perform better than smaller databases; (3)The combination of the search engines should depend on ranking the database and choosing the
appropriate combination function; (4)Search Engines tend to retrieve more overlap relevant document than overlap irrelevant documents; and (5) The merging function which take the
overlapped documents into accounts tend to perform better than the interleave and the rank similarity function.
In addition to these findings the study has developed a set of requirements for the merging process to be successful. This procedure include the databases selection, the combination, and merging upon heuristic solutions.
Identifer | oai:union.ndltd.org:PITT/oai:PITTETD:etd-02032004-163252 |
Date | 31 January 2006 |
Creators | Mohamed, Khaled Abd El-Fatah |
Contributors | Amy Knapp, Christinger Tomer, Donald King, José-Marie Griffiths |
Publisher | University of Pittsburgh |
Source Sets | University of Pittsburgh |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.library.pitt.edu/ETD/available/etd-02032004-163252/ |
Rights | unrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to University of Pittsburgh or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
Page generated in 0.0018 seconds