Global ETD Search

Return to search

WebDoc an Automated Web Document Indexing System

This thesis describes WebDoc, an automated system that classifies Web documents according to the Library of Congress classification system. This work is an extension of an early version of the system that successfully generated indexes for journal articles. The unique features of Web documents, as well as how they will affect the design of a classification system, are discussed. We argue that full-text analysis of Web documents is inevitable, and contextual information must be used to assist the classification. The architecture of the WebDoc system is presented. We performed experiments on it with and without the assistance of contextual information. The results show that contextual information improved the system?s performance significantly.

information retrieval

Web document indexing

Identifer	oai:union.ndltd.org:MSSTATE/oai:scholarsjunction.msstate.edu:td-6000
Date	13 December 2002
Creators	Tang, Bo
Publisher	Scholars Junction
Source Sets	Mississippi State University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations

Page generated in 0.0015 seconds

WebDoc an Automated Web Document Indexing System

Description

Links & Downloads

Tags

Additional Fields