Global ETD Search

Return to search

A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams

Online mining the maximal frequent itemsets over data streams is an important problem in data mining. The maximal frequent itemset is the itemset which the support is large or equal to the minimal support and the itemset is not the subset or superse of each itemset. Previous algorithms to mine the maximal frequent itemsets in the traditional database are not suitable for data streams. Because data streams have some characteristics: (1) continuous (2) fast (3) no data limit (4) real time (5) searching once, mining data streams have many new challenges. First, they are unrealistic to keep the entire stream in the main memory or even in a secondary storage area, since a data stream comes continuously and the amount of data is unbounded. Second, traditional methods of mining on stored datasets by multiple scans are infeasible, since the streaming data is passed only once. Third, mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response time. In order to solve mining maximal frequent itemsets from data streams using the landmark window model, Mao et. al. propose the INSTANT algorithm. In the landmark window model, knowledge discovery is performed based on the values between the beginning time and the present. The advantage of using the landmark window model is that the results are correct as compared to the other models. The structure of the INSTANT algorithm is simple and it can save many memory space. But it takes long time in mining the maximal frequent itemsets. When the new transactions comes, the number of comparisons between the old transactions of INSATNT algorithm is too much. In this thesis, we propose the Set-Checking algorithm to mine frequent itemsets from data streams using the landmark window model. We use the structure of lattice to store our information. The structure of lattice records the subset relationship between the child node and the father node. For every node, we can record the itemset and the support. When the new transaction comes, we consider five relations: (1) equivalent (2) superset (3) subset (4) intersection (5) empty relations. According to the lattice structure of the five sets , we can add the transaction and the renew support efficiently. From our simulation result, we find that the process time of our Set-Checking algorithm is faster than that of the INSTANT algorithm.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0715111-180126

Itemset

Landmark Window Model

Lattice

Maximal Frequent Itemset

Data Stream

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0715111-180126
Date	15 July 2011
Creators	Lin, Pei-Ying
Contributors	Gen-Huey Chen, Ye-In Chang, Chien-I Lee, Shian-Hua Lin
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	English
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0715111-180126
Rights	withheld, Copyright information available at source archive

Page generated in 0.0025 seconds

A Set-Checking Algorithm for Mining Maximal Frequent Itemsets from Data Streams

Description

Links & Downloads

Tags

Additional Fields