Global ETD Search

1	A Sliding-Window Approach to Mining Maximal Large Itemsets for Large Databases Chang, Yuan-feng 28 July 2004 (has links) Mining association rules, means a process of nontrivial extraction of implicit, previously and potentially useful information from data in databases. Mining maximal large itemsets is a further work of mining association rules, which aims to find the set of all subsets of large (frequent) itemsets that could be representative of all large itemsets. Previous algorithms to mining maximal large itemsets can be classified into two approaches: exhausted and shortcut. The shortcut approach could generate smaller number of candidate itemsets than the exhausted approach, resulting in better performance in terms of time and storage space. On the other hand, when updates to the transaction databases occur, one possible approach is to re-run the mining algorithm on the whole database. The other approach is incremental mining, which aims for efficient maintenance of discovered association rules without re-running the mining algorithms. However, previous algorithms for mining maximal large itemsets based on the shortcut approach can not support incremental mining for mining maximal large itemsets. While the algorithms for incremental mining, {it e.g.}, the SWF algorithm, could not efficiently support mining maximal large itemsets, since it is based on the exhausted approach. Therefore, in this thesis, we focus on the design of an algorithm which could provide good performance for both mining maximal itemsets and incremental mining. Based on some observations, for example, ``{it if an itemset is large, all its subsets must be large; therefore, those subsets need not to be examined further}", we propose a Sliding-Window approach, the SWMax algorithm, for efficiently mining maximal large itemsets and incremental mining. Our SWMax algorithm is a two-passes partition-based approach. We will find all candidate 1-itemsets ($C_1$), candidate 3-itemsets ($C_3$), large 1-itemsets ($L_1$), and large 3-itemsets ($L_3$) in the first pass. We generate the virtual maximal large itemsets after the first pass. Then, we use $L_1$ to generate $C_2$, use $L_3$ to generate $C_4$, use $C_4$ to generate $C_5$, until there is no $C_k$ generated. In the second pass, we use the virtual maximal large itemsets to prune $C_k$, and decide the maximal large itemsets. For incremental mining, we consider two cases: (1) data insertion, (2) data deletion. Both in Case 1 and Case 2, if an itemset with size equal to 1 is not large in the original database, it could not be found in the updated database based on the SWF algorithm. That is, a missing case could occur in the incremental mining process of the SWF algorithm, because the SWF algorithm only keeps the $C_2$ information. While our SWMax algorithm could support incremental mining correctly, since $C_1$ and $C_3$ are maintained in our algorithm. We generate some synthetic databases to simulate the real transaction databases in our simulation. From our simulation, the results show that our SWMax algorithm could generate fewer number of candidates and needs less time than the SWF algorithm. partition Incremental Mining Maximal Large Itemsets Data Mining Association Rules

Search results

A Sliding-Window Approach to Mining Maximal Large Itemsets for Large Databases