Doctor of Philosophy / Department of Computing and Information Sciences / William H. Hsu / Association rule learning is a data mining technique that can capture relationships between pairs of entities in different domains. The goal of this research is to discover factors from data that can improve the precision, recall, and accuracy of association rules found using interestingness measures and frequent itemset mining. Such factors can be calibrated using validation data and applied to rank candidate rules in domain-dependent tasks such as link existence prediction. In addition, I use interestingness measures themselves as numerical features to improve link existence prediction. The focus of this dissertation is on developing and testing an analytical framework for association rule interestingness measures, to make them sensitive to the relative size of itemsets. I survey existing interestingness measures and then introduce adaptive parametric models for normalizing and optimizing these measures, based on the size of itemsets containing a candidate pair of co-occurring entities. The central thesis of this work is that in certain domains, the link strength between entities is related to the rarity of their shared memberships (i.e., the size of itemsets in which they co-occur), and that a data-driven approach can capture such properties by normalizing the quantitative measures used to rank associations. To test this hypothesis under different levels of variability in itemset size, I develop several test bed domains, each containing an association rule mining task and a link existence prediction task. The definitions of itemset membership and link existence in each domain depend on its local semantics. My primary goals are: to capture quantitative aspects of these local semantics in normalization factors for association rule interestingness measures; to represent these factors as quantitative features for link existence prediction, to apply them to significantly improve precision and recall in several real-world domains; and to build an experimental framework for measuring this improvement, using information theory and classification-based validation.
Identifer | oai:union.ndltd.org:KSU/oai:krex.k-state.edu:2097/1245 |
Date | January 1900 |
Creators | Aljandal, Waleed A. |
Publisher | Kansas State University |
Source Sets | K-State Research Exchange |
Language | en_US |
Detected Language | English |
Type | Dissertation |
Page generated in 0.0023 seconds