Global ETD Search

Return to search

A Meaningful Candidate Approach to Mining Bi-Directional Traversal Patterns on the WWW

Since the World Wide Web (WWW) appeared, more and more useful information has
been available on the WWW. In order to ﬁnd the information, one application of data
mining techniques on the WWW, referred to as Web mining, has become a research
area with increasing importance. Mining traversal patterns is one of the important
topics in Web mining. It focuses on how to ﬁnd the Web page sequences which are
frequently browsed by users. Although the algorithms for mining association rules
(e.g., Apriori and DHP algorithms) could be applied to mine traversal patterns, they
do not utilize the property of Web transactions and generate too many invalid candidate
patterns. Thus, they could not provide good performance. Wu et al. proposed
an algorithm for mining traversal patterns, SpeedTracer, which utilizes the property
of Web transactions, i.e., the continuous property of the traversal patterns in the Web
structure. Although they decrease the number of candidate patterns generated in the
mining process, they do not eﬃciently utilize the property of Web transactions to
decrease the number of checks while checking the subsets of each candidate pattern.
In this thesis, we design three algorithms, which improve the SpeedTracer algorithm,
for mining traversal patterns. For the ﬁrst algorithm, SpeedTracer*-I, it utilizes the
property of Web transactions to directly generate and count all candidate patterns
from user sessions. Moreover, it utilizes this property to improve the checking step,
when candidate patterns are generated. Next, according to the SpeedTracer*-I algorithm,
we propose SpeedTracer*-II and SpeedTracer*-III algorithms. In these two
algorithms, we improve the performance of the SpeedTracer*-I algorithm by decreasing
the number of times to scan the database. In the SpeedTracer*-II algorithm,
given a parameter n, we apply the SpeedTracer*-I algorithm to ﬁnd Ln ﬁrst, and
use Ln to generate all Ck, where k > n. After generating all candidate patterns, we
scan the database once to count all candidate patterns and then the frequent patterns
could be determined. In the SpeedTracer*-III algorithm, given a parameter n, we also
apply the SpeedTracer*-I algorithm to ﬁnd Ln ﬁrst, and directly generate and count
Ck from user sessions based on Ln, where k > n. The simulation results show that
the performance of the SpeedTracer*-I algorithm is better than that of the Speed-
Tracer algorithm in terms of the processing time. The simulation results also show
that SpeedTracer*-II and SpeedTracer*-III algorithms outperform SpeedTracer and
SpeedTracer*-I algorithms, because the former two algorithms need less times to scan
the database than the latter two algorithms. Moreover, from our simulation results,
we show that all of our proposed algorithms could provide better performance than
Apriori-like algorithms (e.g., FS and FDLP algorithms) in terms of the processing
time.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727104-153403

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0727104-153403
Date	27 July 2004
Creators	Chen, Jiun-rung
Contributors	Tei-wei Kuo, Ye-in Chang, Chien-i Lee, Shian-hua Lin
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727104-153403
Rights	not_available, Copyright information available at source archive

Page generated in 0.0021 seconds

A Meaningful Candidate Approach to Mining Bi-Directional Traversal Patterns on the WWW

Description

Links & Downloads

Tags

Additional Fields