Global ETD Search

Return to search

Automatic classification and metadata generation for world-wide web resources

The aims of this project are to investigate the possibility and potential of automatically classifying Web documents according to a traditional library classification scheme and to investigate the extent to which automatic classification can be used in automatic metadata generation on the web. The Wolverhampton Web Library (WWLib) is a search engine that classifies UK Web pages according to Dewey Decimal Classification (DDC). This search engine is introduced as an example application that would benefit from an automatic classification component such as that described in the thesis. Different approaches to information resource discovery and resource description on the Web are reviewed, as are traditional Information Retrieval (IR) techniques relevant to resource discovery on the Web. The design, implementation and evaluation of an automatic classifier, that classifies Web pages according to DDC, is documented. The evaluation shows that automatic classification is possible and could be used to improve the performance of a search engine. This classifier is then extended to perform automatic metadata generation using the Resource Description Framework (RDF) and Dublin Core. A proposed RDF data model, schema and automatically generated RDF syntax are documented. Automatically generated RDF metadata describing a range of automatically classified documents is shown. The research shows that automatic classification is possible and could potentially be used to enable context sensitive browsing in automated web search engines. The classifications could also be used in generating context sensitive metadata tailored specifically for the search engine domain.

http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.248932

001

Web pages

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:248932
Date	January 2002
Creators	Jenkins, Charlotte
Publisher	University of Wolverhampton
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://hdl.handle.net/2436/89094

Page generated in 0.0025 seconds

Automatic classification and metadata generation for world-wide web resources

Description

Links & Downloads

Tags

Additional Fields