Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:77103 |
Date | January 2002 |
Creators | Kejkula, Martin |
Contributors | Rauch, Jan, Berka, Petr, Máša, Petr |
Publisher | Vysoká škola ekonomická v Praze |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/doctoralThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0018 seconds