Return to search

Maximum entropy modelling for quantifying unexpectedness of data mining results

This thesis is concerned with the problem of finding subjectively interesting patterns in data. The focus is restricted to the most prominent notion of subjective interestingness, namely the unexpectedness of a pattern. A pattern is considered unexpected if it contradicts the user's prior knowledge or beliefs about the data. Recently, a general information-theoretic framework for data. mining that naturally incorporates unexpectedness was devised. The proposed approach relics on: 1. the Maximum Entropy principle for encoding the user's prior knowledge about the data or the patterns, 2. the InfRatio measure, an information-theoretic measure for evaluating the unexpectedness of a pattern and 3. a set covering algorithm for finding the most interesting set of patterns. However, this framework is intentionally phrased in abstract terms and formally applied only for limited types of data mining tasks. This thesis is meant to fill this gap, as its main contribution is the formalization of this general framework to specific data mining tasks in order to demonstrate the wide applicability of the framework ill practice. In particular, we instantiate the three main components of the framework ill order to evaluate frequent item.set.li, clusterings and patterns found in real-valued data such as biclusters and subgroups. Additionally, we provide the first literature review of interestingness mea- sures based on unexpectedness and propose a novel classification of the methods into two classes, namely the "syntactical" and "probabilistic" approaches. We show that exploiting the framework for finding subjectively interesting sets of patterns in data is a highly efficient practice in theoretical, algorithmic and computational terms.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:618550
Date January 2013
CreatorsKontonasios, Kleanthis-Nikolaos
PublisherUniversity of Bristol
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.002 seconds