Return to search

A distributed approach to Frequent Itemset Mining at low support levels

Frequent Itemset Mining, the process of finding frequently co-occurring sets of items in a dataset, has been at the core of the field of data mining for the past 25 years. During this time the datasets have grown much faster than the algorithms capacity to process them. Great progress was made at optimizing this task on a single computer however, despite years of research, very little progress has been made on parallelizing this task. FPGrowth based algorithms have proven notoriously difficult to parallelize and Apriori has largely fallen out of favor with the research community.

In this thesis we introduce a parallel, Apriori based, Frequent Itemset Mining algo-
rithm capable of distributing computation across large commodity clusters. Our case study demonstrates that our algorithm can efficiently scale to hundreds of cores, on a standard Hadoop MapReduce cluster, and can improve executions times by at least an order of magnitude at the lowest support levels. / Graduate / 0984 / 0800 / nclark@uvic.ca

Identiferoai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/5803
Date22 December 2014
CreatorsClark, Neal
ContributorsCoady, Yvonne
Source SetsUniversity of Victoria
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAvailable to the World Wide Web

Page generated in 0.0021 seconds