Return to search

An Obstruction-Check Approach to Mining Closed Sequential Patterns in Data Streams

Online mining sequential patterns over data streams is an important problem
in data mining. There are many applications of using sequential patterns in data
streams, such as market analysis, network security, sensor networks and web track-
ing. Previous studies have shown mining closed patterns provides more benefits than
mining the complete set of frequent patterns, since closed pattern mining leads to
compact results. A sequential pattern is closed if the sequential pattern does not
have any supersequence which has the same support. Chang et al. proposed a time-
based sliding window model. The time-based sliding window has two features, the
new item is inserted in front of a sequence, and the obsolete item is removed from of
tail of a sequence. For solving the problem of data mining in the time-based sliding
window, Chang et al. proposed an algorithm called SeqStream. It uses a data struc-
ture IST (Inverse Closed Sequence Tree) to keep the result. IST can incrementally be
updated by the SeqStream algorithm. Although the SeqStream algorithm has used
the technique of dividing the time-based sliding window to speed up the updating of
IST, the SeqStream algorithm still scans the sliding window many times when IST
needs to be updated. In this thesis, we propose an obstruction-check approach to
maintain the result of closed sequential patterns. Our approach is designed based
on the lattice structure. The feature of the lattice structure is that the parent is a
supersequence of its children. By utilizing this feature, we decide the obstruction
link between the parent and child if their support is the same. If a node does not
have any obstruction link parent, the node is a closed sequential pattern. Then we
can utilize this feature to locally travel the lattice structure. Moreover, we can fully
utilize the features of the time-based sliding window model to locally travel the lat-
tice structure. Based on the lattice structure, we propose the EULB (Exact Update
based on Lattice structure with Bit stream)-Lattice algorithm. The EULB-Lattice
algorithm is an exact method for mining data streams. We record additional informa-
tion, instead of scanning the entire sliding window. We conduct several experiments
using different synthetic data sets. The simulation results show that the proposed
algorithm outperforms the SeqStream algorithm.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0621110-150934
Date21 June 2010
CreatorsChin, Tsz-lin
ContributorsGen-Huey Chen, Ye-In Chang, Tei-Wei Kuo, Chien-I Lee
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0621110-150934
Rightswithheld, Copyright information available at source archive

Page generated in 0.002 seconds