Return to search

A Sliding-Window Approach to Mining Maximal Large Itemsets for Large Databases

Mining association rules, means a process of nontrivial extraction of implicit,
previously and potentially useful information from data in databases. Mining maximal
large itemsets is a further work of mining association rules, which aims to find
the set of all subsets of large (frequent) itemsets that could be representative of all large
itemsets. Previous algorithms to mining maximal large itemsets can be classified into two approaches: exhausted and
shortcut. The shortcut approach could generate smaller number of
candidate itemsets than the exhausted approach,
resulting in better performance in terms of time and storage space.
On the other hand, when updates to the transaction databases occur,
one possible approach is to re-run the mining algorithm on the whole
database. The other approach is incremental mining, which aims for efficient maintenance of discovered association rules
without re-running the mining algorithms. However,
previous algorithms for mining maximal large itemsets based on the shortcut approach
can not support incremental mining for mining maximal large itemsets.
While the algorithms for incremental mining, {it e.g.}, the SWF
algorithm, could not efficiently support mining maximal large
itemsets, since it is based on the exhausted approach.
Therefore, in this thesis, we focus on the design of an
algorithm which could provide good performance for both mining maximal itemsets and incremental mining.
Based on some observations, for example, ``{it if an itemset is large, all its
subsets must be large; therefore, those subsets need not to be examined
further}", we propose a Sliding-Window approach, the SWMax algorithm, for
efficiently mining maximal large itemsets and incremental mining. Our
SWMax algorithm is a two-passes partition-based approach. We will find all candidate
1-itemsets ($C_1$), candidate 3-itemsets ($C_3$), large 1-itemsets ($L_1$),
and large 3-itemsets ($L_3$) in the first pass.
We generate the virtual maximal large itemsets after the first pass. Then, we use $L_1$ to generate $C_2$, use $L_3$
to generate $C_4$, use $C_4$ to generate $C_5$, until there is no
$C_k$ generated. In the second pass, we use the virtual maximal large itemsets to
prune $C_k$, and decide the maximal large itemsets.
For incremental mining, we consider two cases: (1)
data insertion, (2) data deletion. Both in Case 1 and Case 2, if an itemset
with size equal to 1 is not large in the original database, it could not be found in the
updated database based on the SWF algorithm. That is, a missing case
could occur in the incremental mining process of the SWF
algorithm, because the SWF algorithm only keeps the $C_2$ information.
While our SWMax algorithm could support incremental mining
correctly, since $C_1$ and $C_3$ are maintained in our algorithm.
We generate some synthetic databases to simulate the real transaction
databases in our simulation. From our simulation, the
results show that our SWMax algorithm could generate fewer number of candidates
and needs less time than the SWF algorithm.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0728104-135327
Date28 July 2004
CreatorsChang, Yuan-feng
ContributorsYe-in Chang, Chien-i Lee, Tei-wei Kuo, Shian-hua Lin
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0728104-135327
Rightsnot_available, Copyright information available at source archive

Page generated in 0.1674 seconds