Return to search

Framework for web log pre-processing within web usage mining

Web mining is gaining popularity by the day and the role of the web in providing invaluable information about users' behaviour and navigational patterns is now highly appreciated by information technology specialists and businesses alike. Nevertheless, given the enormity of the web and the complexities involved in delivering and retrieving electronic information, one can imagine the difficulties involved in extracting a set of minable objects from the raw and huge web log data. Added to the fact that web mining is a new science, this may explain why research on data pre-processing is still limited in scope. And, although the debate on major issues is still gaining momentum, attempts to establish a coherent and accurate web usage pre-processing framework are still non existent. As a contribution to the existing debate, this research aims at formulating a workable, reliable, and coherent pre-processing framework. The present study will address the following issues: enhance and maximise knowledge about every visit made to a given website from multiple web logs even when they have different schemas, improve the process of eliminating excessive web log data that are not related to users' behaviour, modify the existing approaches for session identification in order to obtain more accurate results and eliminate redundant data that comes as a result of repeatedly adding cached data to the web logs regardless whether or not the added page is a frameset. In addition to the suggested improvements, the study will also introduce a novel task, namely, "automatic web log integration". This will make it possible to integrate different web logs with different schemas into a unified data set. Finally, the study will incorporate unnecessary information, particularly that pertaining to malicious website visits into the non user request removal task. Put together, both the suggested improvements and novel tasks result into a coherent pre-processing framework. To test the reliability and validity of the framework, a website is created in order to perform the necessary experimental work and a prototype pre-processing tool is devised and employed to support it.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:488456
Date January 2004
CreatorsKhairo-Sindi, Mazin Omar
PublisherUniversity of Manchester
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0051 seconds