Global ETD Search

Return to search

A distributed approach to Frequent Itemset Mining at low support levels

Frequent Itemset Mining, the process of finding frequently co-occurring sets of items in a dataset, has been at the core of the field of data mining for the past 25 years. During this time the datasets have grown much faster than the algorithms capacity to process them. Great progress was made at optimizing this task on a single computer however, despite years of research, very little progress has been made on parallelizing this task. FPGrowth based algorithms have proven notoriously difficult to parallelize and Apriori has largely fallen out of favor with the research community.

In this thesis we introduce a parallel, Apriori based, Frequent Itemset Mining algo-
rithm capable of distributing computation across large commodity clusters. Our case study demonstrates that our algorithm can efficiently scale to hundreds of cores, on a standard Hadoop MapReduce cluster, and can improve executions times by at least an order of magnitude at the lowest support levels. / Graduate / 0984 / 0800 / nclark@uvic.ca

http://hdl.handle.net/1828/5803

Apriori

MapReduce

Frequent Itemset Mining

Identifer	oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/5803
Date	22 December 2014
Creators	Clark, Neal
Contributors	Coady, Yvonne
Source Sets	University of Victoria
Language	English, English
Detected Language	English
Type	Thesis
Format	application/pdf
Rights	Available to the World Wide Web

Page generated in 0.0046 seconds

A distributed approach to Frequent Itemset Mining at low support levels

Description

Links & Downloads

Tags

Additional Fields