The translation of binary code to higher-level models has wide applications, including decompilation, binary analysis, and binary rewriting. This calls for high reliability of the underlying trusted computing base (TCB) of the translation methodology. A key challenge is to reduce the TCB by validating its soundness. Both the definition of soundness and the validation method heavily depend on the context: what is in the TCB and how to prove it. This dissertation presents three research contributions. The first two contributions include reducing the TCB in binary verification, and the last contribution includes a binary verification process that leverages a reduced TCB.
The first contribution targets the validation of OCaml-to-PVS translation -- commonly used to translate instruction-set-architecture (ISA) specifications to PVS -- where the destination language is non-executable. We present a methodology called OPEV to validate the translation between OCaml and PVS, supporting non-executable semantics. The validation includes generating large-scale tests for OCaml implementations, generating test lemmas for PVS, and generating proofs that automatically discharge these lemmas. OPEV incorporates an intermediate type system that captures a large subset of OCaml types, employing a variety of rules to generate test cases for each type. To prove the PVS lemmas, we develop automatic proof strategies and discharge the test lemmas using PVS Proof-Lite, a powerful proof scripting utility of the PVS verification system. We demonstrate our approach in two case studies that include 259 functions selected from the Sail and Lem libraries. For each function, we generate thousands of test lemmas, all of which are automatically discharged.
The dissertation's second contribution targets the soundness validation of a disassembly process where the source language does not have well-defined semantics.
Disassembly is a crucial step in binary security, reverse engineering, and binary verification. Various studies in these fields use disassembly tools and hypothesize that the reconstructed disassembly is correct. However, disassembly is an undecidable problem. State-of-the-art disassemblers suffer from issues ranging from incorrectly recovered instructions to incorrectly assessing which addresses belong to instructions and which to data. We present DSV, a systematic and automated approach to validate whether the output of a disassembler is sound with respect to the input binary. No source code, debugging information, or annotations are required. DSV defines soundness using a transition relation defined over concrete machine states: a binary is sound if, for all addresses in the binary that can be reached from the binary's entry point, the bytes of the (disassembled) instruction located at an address are the same as the actual bytes read from the binary. Since computing this transition relation is undecidable, DSV uses over-approximation by preventing false positives (i.e., the existence of an incorrectly disassembled reachable instruction but deemed unreachable) and allowing, but minimizing, false negatives. We apply DSV to 102 binaries of GNU Coreutils with eight different state-of-the-art disassemblers from academia and industry. DSV is able to find soundness issues in the output of all disassemblers.
The dissertation's third contribution is WinCheck: a concolic model checker that detects memory-related properties of closed-source binaries. Bugs related to memory accesses are still a major issue for security vulnerabilities. Even a single buffer overflow or use-after-free in a large program may be the cause of a software crash, a data leak, or a hijacking of the control flow. Typical static formal verification tools aim to detect these issues at the source code level. WinCheck is a model-checker that is directly applicable to closed-source and stripped Windows executables. A key characteristic of WinCheck is that it performs its execution as symbolically as possible while leaving any information related to pointers concrete.
This produces a model checker tailored to pointer-related properties, such as buffer overflows, use-after-free, null-pointer dereferences, and reading from uninitialized memory. The technique thus provides a novel trade-off between ease of use, accuracy, applicability, and scalability. We apply WinCheck to ten closed-source binaries available in a Windows 10 distribution, as well as the Windows version of the entire Coreutils library. We conclude that the approach taken is precise -- provides only a few false negatives -- but may not explore the entire state space due to unresolved indirect jumps. / Doctor of Philosophy / Binary verification is a process that verifies a class of properties, usually security-related properties, on binary files, and does not need access to source code.
Since a binary file is composed of byte sequences and is not human-readable, in the binary verification process, a number of assumptions are usually made. The assumptions often involve the error-free nature of a set of subsystems used in the verification process and constitute the verification process's trusted computing base (or TCB). The reliability of the verification process therefore depends on how reliable the TCB is. The dissertation presents three research contributions in this regard. The first two contributions include reducing the TCB in binary verification, and the last contribution includes a binary verification process that leverages a reduced TCB.
The dissertation's first contribution presents a validation on OCaml-to-PVS translations -- commonly used to translate a computer architecture's instruction specifications to PVS, a language that allows mathematical specifications. To build up a reliable semantical model of assembly instructions, which is assumed to be in the TCB, it is necessary to validate the translation.
The dissertation's second contribution validates the soundness of the disassembly process, which translates a binary file to corresponding assembly instructions.
Since the disassembly process is generally assumed to be trustworthy in many binary verification works, the TCB of binary verification could be reduced by validating the soundness of the disassembly process.
With the reduced TCB, the dissertation introduces WinCheck, the dissertation's third and final contribution: a concolic model checker that validates pointer-related properties of closed-source Windows binaries. The pointer-related properties include absence of buffer overflow, absence of use-after-free, and absence of null-pointer dereference.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/110791 |
Date | 15 June 2022 |
Creators | An, Xiaoxin |
Contributors | Electrical and Computer Engineering, Ravindran, Binoy, Lee, Dongyoon, Chantem, Thidapat, Zeng, Haibo, Verbeek, Freek |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.003 seconds