Return to search

Automatic classification and metadata generation for world-wide web resources

The aims of this project are to investigate the possibility and potential of automatically classifying Web documents according to a traditional library classification scheme and to investigate the extent to which automatic classification can be used in automatic metadata generation on the web. The Wolverhampton Web Library (WWLib) is a search engine that classifies UK Web pages according to Dewey Decimal Classification (DDC). This search engine is introduced as an example application that would benefit from an automatic classification component such as that described in the thesis. Different approaches to information resource discovery and resource description on the Web are reviewed, as are traditional Information Retrieval (IR) techniques relevant to resource discovery on the Web. The design, implementation and evaluation of an automatic classifier, that classifies Web pages according to DDC, is documented. The evaluation shows that automatic classification is possible and could be used to improve the performance of a search engine. This classifier is then extended to perform automatic metadata generation using the Resource Description Framework (RDF) and Dublin Core. A proposed RDF data model, schema and automatically generated RDF syntax are documented. Automatically generated RDF metadata describing a range of automatically classified documents is shown. The research shows that automatic classification is possible and could potentially be used to enable context sensitive browsing in automated web search engines. The classifications could also be used in generating context sensitive metadata tailored specifically for the search engine domain.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:248932
Date January 2002
CreatorsJenkins, Charlotte
PublisherUniversity of Wolverhampton
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://hdl.handle.net/2436/89094

Page generated in 0.0025 seconds