Return to search

Strip-Miner: Automatic Bug Detection in Large Software Code with Low False Positive Rate

There are a number of techniques for automatic bug detection, most of them have a high false positive rate when used in practice. This work proposes an approach, named Strip-Miner, that combines simple dependency analysis of code with a data mining technique "frequent itemset mining" to reduce the false positive rate. We adopt a two phase approach 1) finding the potential bugs and 2) filtering the false positive ones. In the first phase we extract code elements and dependencies among them using static analysis and frequent itemset mining to find programming patterns where a deviation from these patterns is considered as a potential bug. In the second phase, we use the extracted dependencies to build dependency chains between program elements in a programming pattern and a lack of such a chain potentially makes a bug false positive.

Our evaluation on a set of 7 benchmarks consisting of large software code including OpenSSL, PostgreSQL, Git, FFMPEG, SQLite, Binutils and Putty shows that combining simple de- pendency analysis with pattern mining can significantly decrease the number of generated bugs. Using our approach we are able to reduce the number of generated bugs by up to 99.9% with a false positive rate of 65.19% and true positive rate of 34.18% on average as compared to an earlier frequent itemset mining based approach "PR-Miner". / Master of Science / Every software code has bugs in it that can change its expected behavior. There have been a lot of efforts to automate the process of bug detection but most of the techniques proposed have a high rate of false alarms. Some of these techniques leverage the information available in software code to extract programming patterns that can be used to find potential bugs. Although such an approach has proved to be fruitful for finding bugs but large number of false alarms makes it almost useless in software development.

The elements present in a software code have relationships among them formally known as dependencies and the process of finding them is known as dependency analysis. There is a technique known as market basket analysis used by large retailers to find association between items. It works by looking for combinations of items that occur together frequently in transactions. Similarly, in a software code combinations of elements that occur together, can be used to find association between them. This technique is formally known as frequent itemset mining in the data mining domain. This work proposes an approach, named Strip- Miner, that combines dependency analysis with frequent itemset mining to reduce the rate of false alarms. We adopt a two phase approach 1)finding the potential bugs in code and 2)filtering the false alarms. In the first phase we extract code elements and dependencies among them and use frequent itemset mining to find programming patterns where a deviation from these patterns is considered as a potential bug. In the second phase, we use the extracted dependencies to build dependency chains between program elements present in a programming pattern and lack of such a chain is an indication of false alarm.

Our evaluation on a set of 7 benchmarks consisting of large software code including version control systems, database management systems, software security libraries and utility software like media players shows that combining simple dependency analysis with frequent itemset mining can significantly decrease the rate of false alarms. Using our approach we are able to reduce the number of generated bugs by up to 99.9% with a false alarms rate of 65.19% and real bugs rate of 34.18% on average as compared to an earlier frequent itemset mining based approach "PR-Miner".

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/97934
Date28 April 2020
CreatorsIbrar, Fahad
ContributorsComputer Science, Hicks, Matthew, Servant Cortes, Francisco Javier, Butt, Ali R.
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0023 seconds