Global ETD Search

Return to search

Mining frequent sequences in one database scan using distributed computers

Existing frequent-sequence mining algorithms perform multiple scans of a database, or a structure that captures the database. In this M.Sc. thesis, I propose a frequent-sequence mining algorithm that mines each database row as it reads it, so that it can potentially complete mining in the time it takes to read the database once. I achieve this by having my algorithm enumerate all sub-sequences from each row as it reads it.

Since sub-sequence enumeration is a time-consuming process, I create a method to distribute the work over multiple computers, processors, and thread units, while balancing the load between all resources, and limiting the amount of communication so that my algorithm scales well in regards to the number of computers used. Experimental results show that my algorithm is effective, and can potentially complete the mining process in near the time it takes to perform one scan of the input database.

http://hdl.handle.net/1993/4814

data mining

databases

distributed computing

Identifer	oai:union.ndltd.org:MANITOBA/oai:mspace.lib.umanitoba.ca:1993/4814
Date	01 September 2011
Creators	Brajczuk, Dale A.
Contributors	Leung, Carson K. (Computer Science), Irani, Pourang (Computer Science) Rajapakse, Athula (Electrical & Computer Engineering)
Source Sets	University of Manitoba Canada
Detected Language	English

Page generated in 0.0019 seconds

Mining frequent sequences in one database scan using distributed computers

Description

Links & Downloads

Tags

Additional Fields