Return to search

Sound Modular Extraction of Control Flow Graphs from Java Bytecode

Control flow graphs (CFGs) are abstract program models that preserve the control flow information. They have been widely utilized for many static analyses in the past decades. Unfortunately, previous studies about the CFG construction from modern languages, such as Java, have either neglected advanced features that influence the control flow, or do not provide a correctness argument. This is a bearable issue for some program analyses, but not for formal methods, where the soundness of CFGs is a mandatory condition for the verification of safety-critical properties. Moreover, when developing open systems, i.e., systems in which at least one component is missing, one may want to extract CFGs to verify the available components. Soundness is even harder to achieve in this scenario, because of the unknown inter-dependencies involving missing components. In this work we present two variants of a CFG extraction algorithm from Java bytecode considering precise exceptional flow, which are sound w.r.t to the JVM behavior. The first algorithm extracts CFGs from fully-provided (closed) programs only. It proceeds in two phases. Initially the Java bytecode is translated into a stack-less intermediate representation named BIR, which provides explicit representation of exceptions, and is more compact than the original bytecode. Next, we define the transformation from BIR to CFGs, which, among other features, considers the propagation of uncaught exceptions within method calls. We then establish its correctness: the behavior of the extracted CFGs is shown to be a sound over-approximation of the behavior of the original programs. Thus, temporal safety properties that hold for the CFGs also hold for the program. We prove this by suitably combining the properties of the two transformations with those of a previous idealized CFG extraction algorithm, whose correctness has been proven directly. The second variant of the algorithm is defined for open systems. We generalize the extraction algorithm for closed systems for a modular set-up, and resolve inter-dependencies involving missing components by using user-provided interfaces. We establish its correctness by defining a refinement relation between open systems, which constrains the instantiation of missing components. We prove that if the relation holds, then the CFGs extracted from the components of the original open system are sound over-approximations of the CFGs for the same components in the refined system. Thus, temporal safety properties that hold for an open system also hold for closed systems that refine it. We have implemented both algorithms as the ConFlEx tool. It uses Sawja, an external library for the static analysis of Java bytecode, to transform bytecode into BIR, and to resolve virtual method calls. We have extended Sawja to support open systems, and improved its exception type analysis. Experimental results have shown that the algorithm for closed systems generates more precise CFGs than the modular algorithm. This was expected, due to the heavy over-approximations the latter has to perform to be sound. Also, both algorithms are linear in the number of bytecode instructions. Therefore, ConFlEx is efficient for the extraction of CFGs from either open, or closed Java bytecode programs. / <p>QC 20121122</p>

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-105275
Date January 2012
Creatorsde Carvalho Gomes, Pedro
PublisherKTH, Teoretisk datalogi, TCS, Stockholm
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeLicentiate thesis, monograph, info:eu-repo/semantics/masterThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTrita-CSC-A, 1653-5723 ; 2012:16

Page generated in 0.0028 seconds