Title: A study of applying copulas in data mining Author: Martin Ščavnický Department: Department of Theoretical Computer Science and Mathe- matical Logic Supervisor: RNDr. Ing. Martin Holeňa CSc., Department of Theoretical Computer Science and Mathematical Logic Abstract: Copulas are functions that describe the relationship between a multivariate distribution function and its marginals. They provide a way to model multivariate distribution functions, and are extensively used in finance and studied in data mining. In practice, there are many different copula families and no standard way for choosing the right one. In our work, we compare suitability of different copula families in data mining. We fit classification data using 8 copula families and compare them using 3 mea- sures of fit. We also use a classification algorithm based on copulas and compare its accuracy for different copula families. The results indicate that elliptical copulas fit our data better, but hierarchical Archimedean copulas give comparable accuracy in the classification. We also propose and test a modified method for modelling data using hierarchical Archimedean copu- las, which fits some datasets with negative dependence between attributes better. Based on this modified method, we propose a visualization of depen- dence in data and observe...
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:324584 |
Date | January 2013 |
Creators | Ščavnický, Martin |
Contributors | Holeňa, Martin, Hauzar, David |
Source Sets | Czech ETDs |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.002 seconds