Modern software systems have been pervasively concurrent to utilize parallel hardware and perform asynchronous tasks. The correctness of concurrent programming, however, has been challenging for real-world and large systems. As the concurrent events of a system can interleave arbitrarily, unexpected interleavings may lead the system to undefined states, resulting in denials of services, performance degradation, inconsistent data, security issues, etc. To detect such concurrency errors, concurrency testing repeatedly explores the interleavings of a system to find the ones that induce errors. Traditional systematic testing, however, suffers from the intractable number of interleavings due to the complexity in real-world systems. Moreover, each iteration in systematic testing adjusts the explored interleaving with a minimal change that swaps the ordering of two events. Such exploration may waste time in large homogeneous sub-spaces leading to the same testing result. Thus on real-world systems, systematic testing often performs poorly to reveal even simple errors within a limited time budget. On the other hand, randomized testing samples interleavings of the system to quickly surface simple errors with substantial chances, but it may as well explore equivalent interleavings that do not affect the testing results. Such redundancies weaken the probabilistic guarantees and performance of randomized testing to find any errors.
Towards effective concurrency testing, this thesis leverages partial order semantics with randomized testing to find errors with strong probabilistic guarantees. First, we propose partial order sampling (POS), a new randomized testing framework to sample interleavings of a concurrent program with a novel partial order method. It effectively and simultaneously explores the orderings of all events of the program, and has high probabilities to manifest any errors of unexpected interleavings. We formally proved that our approach has exponentially better probabilistic guarantees to sample any partial orders of the program than state-of-the-art approaches. Our evaluation over 32 known concurrency errors in public benchmarks shows that our framework performed 2.6 times better than state-of-the-art approaches to find the errors. Secondly, we describe Morpheus, a new practical concurrency testing tool to apply POS to high-level distributed systems in Erlang. Morpheus leverages dynamic analysis to identify and predict critical events to reorder during testing, and significantly improves the exploration effectiveness of POS. We performed a case study to apply Morpheus on four popular distributed systems in Erlang, including Mnesia, the database system in standard Erlang distribution, and RabbitMQ, the message broker service. Morpheus found 11 previously unknown errors leading to unexpected crashes, deadlocks, and inconsistent states, demonstrating the effectiveness and practicalness of our approaches.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-0fvn-dn65 |
Date | January 2020 |
Creators | Yuan, Xinhao |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0015 seconds