Targeted software testing is a critical task in development of secure software. The core challenge of targeted software testing is to generate many inputs that reach specific code target locations in a given program. However, this task is challenging because it is NP-hard in theory and real-world programs contain very large input spaces and many lines of code, making this difficult in practice.
In this thesis, I introduce a new approach for targeted software testing based on adaptive sampling. The key insight is to reduce the original problem to a sequence of approximate counting problems, and I apply this approach to targeted software testing in two stages.
First, to find a single target-reaching input when no such input is given, I develop a new search algorithm MC2 that adaptively uses approximate-count feedback to narrow down which input region is more likely to contain a target-reaching input using probabilistic bisection.
Second, given a single target-reaching input, I develop a new set approximation algorithm ProgramSampler that adaptively learns an approximation of the set of target-reaching inputs based on approximate-count feedback, where the set approximation can be efficiently uniformly sampled for many target-reaching inputs.
Backed by theoretical guarantees, these techniques have been highly effective in practice: outperforming existing methods on average by 1-2 orders of magnitude.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/ggq1-9122 |
Date | January 2024 |
Creators | Shah, Abhishek |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0023 seconds