Global ETD Search

Return to search

Vector-Item Pattern Mining Algorithms and their Applications

Advances in storage technology have long been driving the need for new data mining techniques. Not only are typical data sets becoming larger, but the diversity of available attributes is increasing in many problem domains. In biological applications for example, a single protein may have associated sequence-, text-, graph-, continuous and item data. Correspondingly, there is growing need for techniques to find patterns in such complex data. Many techniques exist for mapping specific types of data to vector space representations, such as the bag-of-words model for text [58] or embedding in vector spaces of graphs [94, 91]. However, there are few techniques that recognize the resulting vector space representations as units that may be combined and further processed. This research aims to mine important vector-item patterns hidden across multiple and diverse data sources. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. Two types of vector-item pattern mining algorithms have been developed, namely histogram-based vector-item pattern mining algorithms and point distribution vector-item pattern mining algorithms. In histogram-based vector-item pattern mining algorithms, a vector-item pattern is significant or important if its density histogram significantly differs from what is expected for a random subset of transactions, using χ² goodness-of-fit test or effect size analysis. For point distribution vector-item pattern mining algorithms, a vector-item pattern is significant if its probability density function (PDF) has a big KullbackLeibler divergence from random subsamples. We have applied the vector-item pattern mining algorithms to several application areas, and by comparing with other state-of-art algorithms we justify the effectiveness and efficiency of the algorithms.

https://hdl.handle.net/10365/28841

Data mining.

Pattern recognition systems.

Computer algorithms.

Identifer	oai:union.ndltd.org:ndsu.edu/oai:library.ndsu.edu:10365/28841
Date	January 2011
Creators	Wu, Jianfei
Publisher	North Dakota State University
Source Sets	North Dakota State University
Detected Language	English
Type	text/dissertation
Format	application/pdf
Rights	NDSU Policy 190.6.2, https://www.ndsu.edu/fileadmin/policy/190.pdf

Page generated in 0.0011 seconds

Vector-Item Pattern Mining Algorithms and their Applications

Description

Links & Downloads

Tags

Additional Fields