Spelling suggestions: "subject:"maximal frequent itemset"" "subject:"maximal frequent itemsets""
1 |
A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data StreamsLin, Pei-Ying 15 July 2011 (has links)
Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm.
|
2 |
A Subset-Lattice Algorithm for Mining Maximal Frequent Itemsets over a Data Stream Sliding WindowWang, Syuan-Yun 09 July 2012 (has links)
Online mining association rules in data streams is an important field in the data
mining. Among them, mining the maximal frequent itemsets is also an important
issue. A frequent itemset is called maximal if it is not a subset of any other frequent
itemset. The set of all the maximal frequent itemsets is denoted as the maximal
frequent itemset. Because data streams are continuous, high speed, unbounded, and
real time. As a result, we can only scan once for the data streams. Therefore, the
previous algorithms to mine the maximal frequent itemsets in the traditional
databases are not suitable for the data streams. Furthermore, many applications are
interested in the recent data streams, and the sliding window is the model which
deal with the most recent data streams. In the sliding window model, a window size
is required. One of the algorithms for mining the maximal frequent itemsets based
on the sliding window model is called the MFIoSSW algorithm. The MFIoSSW
algorithm uses a compact structure to mine the maximal frequent itemsets. It uses
an array-based structure A to store the maximal frequent itemsets and other helpful
itemsets. But it takes long time to mine the maximal frequent itemsets. When the
new transaction comes, the number of comparison between the new transaction and
the old transactions is too much. Therefore, in this project, we propose a sliding
window approach, the Subset-Lattice algorithm. We use the lattice structure to store
the information of the transactions. The structure of the lattice stores the relationship
between the child node and the father node. In each node, we record the itemset and
the support. When the new transaction comes, we consider five relations: (1)
equivalent, (2) subset, (3) intersection, (4) empty set, (5) superset. With this five
relations, we can add the new transactions and update the support efficiently.
|
Page generated in 0.0553 seconds