Global ETD Search

Return to search

A Domain Based Approach to Crawl the Hidden Web

There is a lot of research work being performed on indexing the Web. More and more sophisticated Web crawlers are been designed to search and index the Web faster. But all these traditional crawlers crawl only the part of Web we call “Surface Web”. They are unable to crawl the hidden portion of the Web. These traditional crawlers retrieve contents only from surface Web pages which are just a set of Web pages linked by some hyperlinks and ignoring the hidden information. Hence, they ignore tremendous amount of information hidden behind these search forms in Web pages. Most of the published research has been done to detect such searchable forms and make a systematic search over these forms. Our approach here will be based on a Web crawler that analyzes search forms and fills tem with appropriate content to retrieve maximum relevant information from the database.

Identifer	oai:union.ndltd.org:GEORGIA/oai:digitalarchive.gsu.edu:cs_theses-1031
Date	04 December 2006
Creators	Pandya, Milan
Publisher	Digital Archive @ GSU
Source Sets	Georgia State University
Detected Language	English
Type	text
Format	application/pdf
Source	Computer Science Theses

Page generated in 0.002 seconds

A Domain Based Approach to Crawl the Hidden Web

Description

Links & Downloads

Tags

Additional Fields