Global ETD Search

Return to search

Focused crawls, tunneling, and digital libraries

Crawling the Web to build collections of documents related
to pre-speciï¬ ed topics became an active area of research during the late
1990â s, crawler technology having been developed for use by search engines.
Now, Web crawling is being seriously considered as an important
strategy for building large scale digital libraries. This paper covers some
of the crawl technologies that might be exploited for collection building.
For example, to make such collection-building crawls more effective,
focused crawling was developed, in which the goal was to make a
â best-ï¬ rstâ crawl of the Web. We are using powerful crawler software to
implement a focused crawl but use tunneling to overcome some of the
limitations of a pure best-ï¬ rst approach. Tunneling has been described
by others as not only prioritizing links from pages according to the pageâ s
relevance score, but also estimating the value of each link and prioritizing
them as well. We add to this mix by devising a tunneling focused crawling
strategy which evaluates the current crawl direction on the ï¬ y to
determine when to terminate a tunneling activity. Results indicate that
a combination of focused crawling and tunneling could be an effective
tool for building digital libraries.

Digital Libraries

Identifer	oai:union.ndltd.org:arizona.edu/oai:arizona.openrepository.com:10150/106527
Date	January 2002
Creators	Bergmark, Donna, Lagoze, Carl, Sbityakov, Alex
Source Sets	University of Arizona
Language	English
Detected Language	English
Type	Preprint

Page generated in 0.002 seconds

Focused crawls, tunneling, and digital libraries

Description

Links & Downloads

Tags

Additional Fields