Global ETD Search

Return to search

Automatically Extract Information from Web Documents

The Internet could be considered to be a reservoir of useful information in textual form — product catalogs, airline schedules, stock market quotations, weather forecast etc. There has been much interest in building systems that gather such information on a user's behalf. But because these information resources are formatted differently, mechanically extracting their content is difficult. Systems using such resources typically use hand-coded wrappers, customized procedures for information extraction. Structured data objects are a very important type of information on the Web. Such data objects are often records from underlying databases and displayed in Web pages with some fixed templates. Mining data records in Web pages is useful because they typically present their host pages' essential information, such as lists of products and services. Extracting these structured data objects enables one to integrate data/information from multiple Web pages to provide value-added services, e.g., comparative shopping, meta-querying and search. Web content mining has thus become an area of interest for many researchers because of the phenomenal growth of the Web contents and the economic benefits associated with it. However, due to the heterogeneity of Web pages, automated discovery of targeted information is still posing as a challenging problem.

Data mining

Computer Sciences

Identifer	oai:union.ndltd.org:WKU/oai:digitalcommons.wku.edu:theses-1379
Date	01 December 2007
Creators	Sharma, Dipesh
Publisher	TopSCHOLAR®
Source Sets	Western Kentucky University Theses
Detected Language	English
Type	text
Format	application/pdf
Source	Masters Theses & Specialist Projects

Page generated in 0.0099 seconds

Automatically Extract Information from Web Documents

Description

Links & Downloads

Tags

Additional Fields