Sequential pattern mining constitutes a basis for solution of problems in various domains like bio-informatics and web usage mining. Research on this field continues seeking faster algorithms. WAP-Tree based algorithms that emerged from web usage mining literature have shown a remarkable performance on single-item sequence databases. In this study, we investigated application of WAP-Tree based mining to multi-item sequential pattern mining and we designed an extension of WAP-Tree data structure for multi-item sequence databases, the MULTI-WAP-Tree. In addition, we propose a new mining strategy on WAP-Tree which involves a hybrid traversal strategy in possible sequences search space and a new early prunning idea called Sibling Principle on Pattern Tree. Two algorithms, FOF-PT and MULTI-FOF-PT, applying this strategy on WAP-Tree and MULTI-WAP-Tree respectively, are developed. Experiments showed that FOF-PT outperforms both other WAP-Tree based algorithms and PrefixSpan in terms of execution time. Moreover, experimental results revealed MULTI-FOF-PT finds patterns faster than PrefixSpan on dense multi-item sequence databases with small alphabets.
Identifer | oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/12614638/index.pdf |
Date | 01 September 2012 |
Creators | Onal, Kezban Dilek |
Contributors | Senkul, Pinar |
Publisher | METU |
Source Sets | Middle East Technical Univ. |
Language | English |
Detected Language | English |
Type | M.S. Thesis |
Format | text/pdf |
Rights | Access forbidden for 1 year |
Page generated in 0.001 seconds