Return to search

Deep Web Collection Selection

The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods. An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query. A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers Server system. The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection. This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries. The experiment shows that the performance of sample-based technique is satisfactory in average.

Identiferoai:union.ndltd.org:ADTP/264987
Date January 2004
CreatorsKing, John Douglas
PublisherQueensland University of Technology
Source SetsAustraliasian Digital Theses Program
Detected LanguageEnglish
RightsCopyright John Douglas King

Page generated in 0.005 seconds