Return to search

Grid-Enabled Automatic Web Page Classification

Much research has been conducted on the retrieval and classification of web-based information. A big challenge is the performance issue, especially for a classification algorithm returning results for a large set of data that is typical when accessing the Web. This thesis describes a grid-enabled approach for automatic web page classification. The basic approach is first described that uses a vector space model (VSM). An enhancement of the approach through the use of a genetic algorithm (GA) is then described. The enhanced approach can efficiently process candidate web pages from a number of web sites and classify them. A prototype is implemented and empirical studies are conducted. The contributions of this thesis are: 1) Application of grid computing to improve performance of both VSM and GA using VSM based web page classification; 2) Improvement of the VSM classification algorithm by applying GA that uniquely discovers a set of training web pages while also generating a near optimal parameter values set for VSM.

Identiferoai:union.ndltd.org:GEORGIA/oai:digitalarchive.gsu.edu:cs_theses-1022
Date12 June 2006
CreatorsMetikurke, Seema Sreenivasamurthy
PublisherDigital Archive @ GSU
Source SetsGeorgia State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceComputer Science Theses

Page generated in 0.0023 seconds