Return to search

A method for the evaluation of similarity measures on graphs and network-structured data

Measures of similarity play a subtle but important role in a large number of disciplines. For example, a researcher in bioinformatics may devise a new computed measure of similarity between biological structures, and use its scores to infer bio-logical association. Other academics may use related approaches in structured text search, or for object recognition in computer vision. These are diverse and practical applications of similarity. A critical question is this: to what extent can a given similarity measure be trusted? This is a difficult problem, at the heart of which lies the broader issue: what exactly constitutes good similarity judgement? This research presents the view that similarity measures have properties of judgement that are intrinsic to their formulation, and that such properties are measurable. The problem of comparing similarity measures is one of identifying ground-truths for similarity. The approach taken in this work is to examine the relative ordering of graph pairs, when compared with respect to a common reference graph. Ground- truth outcomes are obtained from a novel theory: the theory of irreducible change in graphs. This theory supports stronger claims than those made for edit distances. Whereas edit distances are sensitive to a configuration of costs, irreducible change under the new theory is independent of such parameters. Ground-truth data is obtained by isolating test cases for which a common outcome is assured for all possible least measures of change that can be formulated within a chosen change descriptor space. By isolating these specific cases, and excluding others, the research introduces a framework for evaluating similarity measures on mathematically defensible grounds. The evaluation method is demonstrated in a series of case studies which evaluate the similarity performance of known graph similarity measures. The findings of these experiments provide the first general characterisation of common similarity measures over a wide range of graph properties. The similarity computed from the maximum common induced subgraph (Dice-MCIS) is shown to provide good general similarity judgement. However, it is shown that Blondel's similarity measure can exceed the judgement sensitivity of Dice-MCIS, provided the graphs have both sufficient attribute label diversity, and edge density. The final contribution is the introduction of a new similarity measure for graphs, which is shown to have statistically greater judgement sensitivity than all other measures examined. All of these findings are made possible through the theory of irreducible change in graphs. The research provides the first mathematical basis for reasoning about the quality of similarity judgments. This enables researchers to analyse similarity measures directly, making similarity measures first class objects of scientific inquiry.

  1. vital:10494
Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:nmmu/vital:10494
Date January 2014
CreatorsNaude, Kevin Alexander
PublisherNelson Mandela Metropolitan University, Faculty of Science
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis, Doctoral, DPhil
Formatxi, 259 leaves, pdf
RightsNelson Mandela Metropolitan University

Page generated in 0.0044 seconds