Pattern discovery in large binary relations has been extensively studied. An emblematic success in this area concerns frequent itemset mining and its post-processing that derives association rules. In this case, we mine binary relations that encode whether some properties are satisfied or not by some objects. It is however clear that many datasets correspond to n-ary relations where n > 2. For example, adding spatial and/or temporal dimensions (location and/or time when the properties are satisfied by the objects) leads to the 4-ary relation Objects x Properties x Places x Times. Therefore, we study the generalization of association rule mining within arbitrary n-ary relations: the datasets are now Boolean tensors and not only Boolean matrices. Unlike standard rules that involve subsets of only one domain of the relation, in our setting, the head and the body of a rule can include arbitrary subsets of some selected domains. A significant contribution of this thesis concerns the design of interestingness measures for such generalized rules: besides a frequency measures, two different views on rule confidence are considered. The concept of non-redundant rules and the efficient extraction of the non-redundant rules satisfying the minimal frequency and minimal confidence constraints are also studied. To increase the subjective interestingness of rules, we then introduce disjunctions in their heads. It requires to redefine the interestingness measures again and to revisit the redundancy issues. Finally, we apply our new rule discovery techniques to dynamic relational graph analysis. Such graphs can be encoded into n-ary relations (n ≥ 3). Our use case concerns bicycle renting in the Vélo'v system (self-service bicycle renting in Lyon). It illustrates the added-value of some rules that can be computed thanks to our software prototypes.
|Date||23 October 2012|
|Creators||Nguyen, Thi Kim Ngan|
|Publisher||INSA de Lyon|
|Source Sets||CCSD theses-EN-ligne, France|
Page generated in 0.0028 seconds