Return to search

Methods and measures for statistical fault localisation

Fault localisation is the process of finding the causes of a given error, and is one of the most costly elements of software development. One of the most efficient approaches to fault localisation appeals to statistical methods. These methods are characterised by their ability to estimate how faulty a program artefact is as a function of statistical information about a given program and test suite. However, the major problem facing statistical approaches is their effectiveness -- particularly with respect to finding single (or multiple) faults in large programs typical to the real world. A solution to this problem hinges on discovering new formal properties of faulty programs and developing scalable statistical techniques which exploit them. In this thesis I address this by identifying new properties of faulty programs, developing the formal frameworks and methods which are formally proven to exploit them, and demonstrating that many of our new techniques substantially and statistically significantly outperform competing algorithms at given fault localisation tasks (using p = 0.01) on what (to our knowledge) is one of the largest scale set of experiments in fault localisation to date. This research is thus designed to corroborate the following thesis statement: That the new algorithms presented in this thesis are effective and efficient at software fault localisation and outperform state of the art statistical techniques at a range of fault localisation tasks. In more detail, the major thesis contributions are as follows: 1. We perform a thorough investigation into the existing framework of (sbfl), which currently stands at the cutting edge of statistical fault localisation. To improve on the effectiveness of sbfl, our first contribution is to introduce and motivate many new statistical measures which can be used within this framework. First, we show that many are well motivated to the task of sbfl. Second, we formally prove equivalence properties of large classes of measures. Third, we show that many of the measures perform competitively with the existing measures in experimentation -- in particular our new measure m9185 outperforms all existing measures on average in terms of effectiveness, and along with Kulkzynski2, is in a class of measures which statistically significantly outperforms all other measures at finding a single fault in a program (p = 0.01). 2. Having investigated sbfl, our second contribution is to motivate, introduce, and formally develop a new formal framework which we call probabilistic fault localisation (pfl). pfl is similar to sbfl insofar as it can leverage any suspiciousness measure, and is designed to directly estimate the probability that a given program artefact is faulty. First, we formally prove that pfl is theoretically superior to sbfl insofar as it satisfies and exploits a number of desirable formal properties which sbfl does not. Second, we experimentally show that pfl methods (namely, our measure pfl-ppv) substantially and statistically significantly outperforms the best performing sbfl measures at finding a fault in large multiple fault programs (p = 0.01). Furthermore, we show that for many of our benchmarks it is theoretically impossible to design strictly rational sbfl measures which outperform given pfl techniques. 3. Having addressed the problem of localising a single fault in a pro- gram, we address the problem of localising multiple faults. Accord- ingly, our third major contribution is the introduction and motiva- tion of a new algorithm M<sub>Opt(g)</sub> which optimises any ranking-based method g (such as pfl/sbfl/Barinel) to the task of multiple fault localisation. First we prove that MOpt(g) formally satisfies and exploits a newly identified formal property of multiple fault optimality. Secondly, we experimentally show that there are values for g such that M<sub>Opt(g)</sub> substantially and statistically significantly outperforms given ranking-based fault localisation methods at the task of finding multiple faults (p = 0.01). 4. Having developed methods for localising faults as a function of a given test suite, we finally address the problem of optimising test suites for the purposes of fault localisation. Accordingly, we first present an algorithm which leverages model checkers to improve a given test suite by making it satisfy a property of single bug opti- mality. Second, we experimentally show that on small benchmarks single bug optimal test suites can be generated (from scratch) efficiently when the algorithm is used in conjunction with the cbmc model checker, and that the test suite generated can be used effectively for fault localisation.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:730271
Date January 2016
CreatorsLandsberg, David
ContributorsKroening, Daniel ; Chockler, Hana
PublisherUniversity of Oxford
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://ora.ox.ac.uk/objects/uuid:cf737e06-9f12-44fa-94d2-a8d247ad808e

Page generated in 0.0082 seconds