Global ETD Search

Return to search

Scalable APRIORI-based frequent pattern discovery

Frequent itemset mining, the task of finding sets of items that frequently occur to-
gether in a dataset, has been at the core of the field of data mining for the past
sixteen years. In that time, the size of datasets has grown much faster than has the
ability of existing algorithms to handle those datasets. Consequentely, improvements
are needed.
In this thesis, we take the classic algorithm for the problem, A Priori, and improve it quite significantly by introducing what we call a vertical sort. We then use the benchmark large dataset, webdocs, from the FIMI 2004 conference to contrast our
performance against several state-of-the-art implementations and demonstrate not
only equal efficiency with lower memory usage at all support thresholds, but also the
ability to mine support thresholds as yet unattempted in literature. We also indicate
how we believe this work can be extended to achieve yet more impressive results.

http://hdl.handle.net/1828/1370

data mining

apriori

frequent itemset mining

machine learning

Identifer	oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/1370
Date	28 April 2009
Creators	Chester, Sean
Contributors	Thomo, Alex
Source Sets	University of Victoria
Language	English, English
Detected Language	English
Type	Thesis
Rights	Available to the World Wide Web

Page generated in 0.002 seconds

Scalable APRIORI-based frequent pattern discovery

Description

Links & Downloads

Tags

Additional Fields