• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Meaningful Candidate Approach to Mining Bi-Directional Traversal Patterns on the WWW

Chen, Jiun-rung 27 July 2004 (has links)
Since the World Wide Web (WWW) appeared, more and more useful information has been available on the WWW. In order to find the information, one application of data mining techniques on the WWW, referred to as Web mining, has become a research area with increasing importance. Mining traversal patterns is one of the important topics in Web mining. It focuses on how to find the Web page sequences which are frequently browsed by users. Although the algorithms for mining association rules (e.g., Apriori and DHP algorithms) could be applied to mine traversal patterns, they do not utilize the property of Web transactions and generate too many invalid candidate patterns. Thus, they could not provide good performance. Wu et al. proposed an algorithm for mining traversal patterns, SpeedTracer, which utilizes the property of Web transactions, i.e., the continuous property of the traversal patterns in the Web structure. Although they decrease the number of candidate patterns generated in the mining process, they do not efficiently utilize the property of Web transactions to decrease the number of checks while checking the subsets of each candidate pattern. In this thesis, we design three algorithms, which improve the SpeedTracer algorithm, for mining traversal patterns. For the first algorithm, SpeedTracer*-I, it utilizes the property of Web transactions to directly generate and count all candidate patterns from user sessions. Moreover, it utilizes this property to improve the checking step, when candidate patterns are generated. Next, according to the SpeedTracer*-I algorithm, we propose SpeedTracer*-II and SpeedTracer*-III algorithms. In these two algorithms, we improve the performance of the SpeedTracer*-I algorithm by decreasing the number of times to scan the database. In the SpeedTracer*-II algorithm, given a parameter n, we apply the SpeedTracer*-I algorithm to find Ln first, and use Ln to generate all Ck, where k > n. After generating all candidate patterns, we scan the database once to count all candidate patterns and then the frequent patterns could be determined. In the SpeedTracer*-III algorithm, given a parameter n, we also apply the SpeedTracer*-I algorithm to find Ln first, and directly generate and count Ck from user sessions based on Ln, where k > n. The simulation results show that the performance of the SpeedTracer*-I algorithm is better than that of the Speed- Tracer algorithm in terms of the processing time. The simulation results also show that SpeedTracer*-II and SpeedTracer*-III algorithms outperform SpeedTracer and SpeedTracer*-I algorithms, because the former two algorithms need less times to scan the database than the latter two algorithms. Moreover, from our simulation results, we show that all of our proposed algorithms could provide better performance than Apriori-like algorithms (e.g., FS and FDLP algorithms) in terms of the processing time.

Page generated in 0.0781 seconds