Return to search
## A method for the evaluation of similarity measures on graphs and network-structured data

Measures of similarity play a subtle but important role in a large number of disciplines. For example, a researcher in bioinformatics may devise a new computed measure of similarity between biological structures, and use its scores to infer bio-logical association. Other academics may use related approaches in structured text search, or for object recognition in computer vision. These are diverse and practical applications of similarity. A critical question is this: to what extent can a given similarity measure be trusted? This is a difficult problem, at the heart of which lies the broader issue: what exactly constitutes good similarity judgement? This research presents the view that similarity measures have properties of judgement that are intrinsic to their formulation, and that such properties are measurable. The problem of comparing similarity measures is one of identifying ground-truths for similarity. The approach taken in this work is to examine the relative ordering of graph pairs, when compared with respect to a common reference graph. Ground- truth outcomes are obtained from a novel theory: the theory of irreducible change in graphs. This theory supports stronger claims than those made for edit distances. Whereas edit distances are sensitive to a configuration of costs, irreducible change under the new theory is independent of such parameters. Ground-truth data is obtained by isolating test cases for which a common outcome is assured for all possible least measures of change that can be formulated within a chosen change descriptor space. By isolating these specific cases, and excluding others, the research introduces a framework for evaluating similarity measures on mathematically defensible grounds. The evaluation method is demonstrated in a series of case studies which evaluate the similarity performance of known graph similarity measures. The findings of these experiments provide the first general characterisation of common similarity measures over a wide range of graph properties. The similarity computed from the maximum common induced subgraph (Dice-MCIS) is shown to provide good general similarity judgement. However, it is shown that Blondel's similarity measure can exceed the judgement sensitivity of Dice-MCIS, provided the graphs have both sufficient attribute label diversity, and edge density. The final contribution is the introduction of a new similarity measure for graphs, which is shown to have statistically greater judgement sensitivity than all other measures examined. All of these findings are made possible through the theory of irreducible change in graphs. The research provides the first mathematical basis for reasoning about the quality of similarity judgments. This enables researchers to analyse similarity measures directly, making similarity measures first class objects of scientific inquiry.

Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:nmmu/vital:10494 |

Date | January 2014 |

Creators | Naude, Kevin Alexander |

Publisher | Nelson Mandela Metropolitan University, Faculty of Science |

Source Sets | South African National ETD Portal |

Language | English |

Detected Language | English |

Type | Thesis, Doctoral, DPhil |

Format | xi, 259 leaves, pdf |

Rights | Nelson Mandela Metropolitan University |

Page generated in 0.0017 seconds