Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking.
Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a
closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a
sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts.
Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0625109-204302 |
Date | 25 June 2009 |
Creators | Peng, Wei-hau |
Contributors | Chien-I Lee, Ye-In Chang, Gen-Huey Chen, San-Yi Huang |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0625109-204302 |
Rights | withheld, Copyright information available at source archive |
Page generated in 0.002 seconds