Global ETD Search

Return to search

Deep Web Collection Selection

The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods. An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query. A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers Server system. The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection. This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries. The experiment shows that the performance of sample-based technique is satisfactory in average.

http://eprints.qut.edu.au/15992/

information retrieval

deep web

collection selection

singular value decomposition

latent semantic analysis

sampling

query focused

probabilistic

Identifer	oai:union.ndltd.org:ADTP/264987
Date	January 2004
Creators	King, John Douglas
Publisher	Queensland University of Technology
Source Sets	Australiasian Digital Theses Program
Detected Language	English
Rights	Copyright John Douglas King

Page generated in 0.0011 seconds

Deep Web Collection Selection

Description

Links & Downloads

Tags

Additional Fields