1 |
Abstraction Recovery for Scalable Static Binary AnalysisSchwartz, Edward J. 01 May 2014 (has links)
Many source code tools help software programmers analyze programs as they are being developed, but such tools can no longer be applied once the final programs are shipped to the user. This greatly limits users, security experts, and anyone other than the programmer who wishes to perform additional testing and program analysis. This dissertation is concerned with the development of scalable techniques for statically analyzing binary programs, which can be employed by anyone who has access to the binary. Unfortunately, static binary analysis is often more difficult than static source code analysis because the abstractions that are the basis of source code programs, such as variables, types, functions, and control flow structure, are not explicitly present in binary programs. Previous approaches work around the the lack of abstractions by reasoning about the program at a lower level, but this approach has not scaled as well as equivalent source code techniques that use abstractions. This dissertation investigates an alternative approach to static binary analysis which is called abstraction recovery. The premise of abstraction recovery is that since many binaries are actually compiled from an abstract source language which is more suitable for analysis, the first step of static binary analysis should be to recover such abstractions. Abstraction recovery is shown to be feasible in two real-world applications. First, C abstractions are recovered by a newly developed decompiler. The second application recovers gadget abstractions to automatically generate return-oriented programming (ROP) attacks. Experiments using the decompiler demonstrate that recovering C abstractions improves scalability over low-level analysis, with applications such as verification and detection of buffer overflows seeing an average of 17× improvement. Similarly, gadget abstractions speed up automated ROP attacks by 99×. Though some binary analysis problems do not lend themselves to abstraction recovery because they reason about low-level or syntactic details, abstraction recovery is an attractive alternative to conventional low-level analysis when users are interested in the behavior of the original abstract program from which a binary was compiled, which is often the case.
|
2 |
Predictors of Ransomware from Binary AnalysisOtis, Aaron M 01 September 2019 (has links) (PDF)
Ransomware, a type of malware that extorts payment from a victim by encrypting her data, is a growing threat that is becoming more sophisticated with each generation. Attackers have shifted from targeting individuals to entire organizations, raising extortions from hundreds of dollars to hundreds of thousands of dollars. In this work, we analyze a variety of ransomware and benign software binaries in order to identify indicators that may be used to detect ransomware. We find that several combinations of strings, cryptographic constants, and a large number loops are key indicators useful for detecting ransomware.
|
3 |
Deobfuscation of Packed and Virtualization-Obfuscation Protected BinariesCoogan, Kevin Patrick January 2011 (has links)
Code obfuscation techniques are increasingly being used in software for such reasons as protecting trade secret algorithms from competitors and deterring license tampering by those wishing to use the software for free. However, these techniques have also grown in popularity in less legitimate areas, such as protecting malware from detection and reverse engineering. This work examines two such techniques - packing and virtualization-obfuscation - and presents new behavioral approaches to analysis that may be relevant to security analysts whose job it is to defend against malicious code. These approaches are robust against variations in obfuscation algorithms, such as changing encryption keys or virtual instruction byte code.Packing refers to the process of encrypting or compressing an executable file. This process "scrambles" the bytes of the executable so that byte-signature matching algorithms commonly used by anti-virus programs are ineffective. Standard static analysis techniques are similarly ineffective since the actual byte code of the program is hidden until after the program is executed. Dynamic analysis approaches exist, but are vulnerable to dynamic defenses. We detail a static analysis technique that starts by identifying the code used to "unpack" the executable, then uses this unpacker to generate the unpacked code in a form suitable for static analysis. Results show we are able to correctly unpack several encrypted and compressed malware, while still handling several dynamic defenses.Virtualization-obfuscation is a technique that translates the original program into virtual instructions, then builds a customized virtual machine for these instructions. As with packing, the byte-signature of the original program is destroyed. Furthermore, static analysis of the obfuscated program reveals only the structure of the virtual machine, and dynamic analysis produces a dynamic trace where original program instructions are intermixed, and often indistinguishable from, virtual machine instructions. We present a dynamic analysis approach whereby all instructions that affect the external behavior of the program are identified, thus building an approximation of the original program that is observationally equivalent. We achieve good results at both identifying instructions from the original program, as well as eliminating instructions known to be part of the virtual machine.
|
4 |
FORCED EXECUTION FOR SECURITY ANALYSIS OF SOFTWARE WITHOUT SOURCE CODEFei Peng (10682163) 03 May 2021 (has links)
<div><div><div><p>Binary code analysis is widely used in many applications, including reverse engineering, software forensics and security. It is very critical in these applications, since the analysis of binary code does not require source code to be available. For example, in one of the security applications, given a potentially malicious executable file, binary analysis can help building human inspectable representations such as control flow graph and call graph.</p><p>Existing binary analysis can be roughly classified into two categories, that are static analysis, and dynamic analysis. Both types of analysis have their own strengths and limitations. Static binary analysis is based on the result of scanning the binary code without executing it. It usually has good code coverage, but the analysis results are sometimes not quite accurate due to the lack of dynamic execution information. Dynamic binary analysis, on the other hand, is based on executing the binary on a set of inputs. On the contrast, the results are usually accurate but heavily rely on the coverage of the test inputs, which sometimes do not exist.</p><p>In this thesis, we first present a novel systematic binary analysis framework called X-Force. Basically, X-Force can force the binary to execute without using any inputs or proper environment setup. As part of the design of our framework, we have proposed a number of techniques, that includes (1) path exploration module which can drive the program to execute different paths; (2) a crash-free execution model that could detect and recover from execution exceptions properly; (3) overcoming a large number of technical challenges in making the technique work on real world binaries.</p><p>Although X-Force is a highly effective method to penetrate malware self-protection and expose hidden behavior, it is very heavy-weight. The reason is that it requires tracing individual instructions, reasoning about pointer alias relations on-the-fly, and repairing invalid pointers by on-demand memory allocation. To further solve this problem, we develop a light-weight and practical forced execution technique. Without losing analysis precision, it avoids tracking individual instructions and on-demand allocation. Under our scheme, a forced execution is very similar to a native one. It features a novel memory pre-planning phase that pre-allocates a large memory buffer, and then initializes the buffer, and variables in the subject binary, with carefully crafted values in a random fashion before the real execution. The pre-planning is designed in such a way that dereferencing an invalid pointer has a very large chance to fall into the pre-allocated region and hence does not cause any exception, and semantically unrelated invalid pointer dereferences highly likely access disjoint (pre-allocated) memory regions, avoiding state corruptions with probabilistic guarantees.</p></div></div></div>
|
5 |
Lambda Calculus for Binary Security and AnalysisStaursky, Joseph N. 30 September 2021 (has links)
No description available.
|
6 |
Automatic Deobfuscation and Reverse Engineering of Obfuscated CodeYadegari, Babak January 2016 (has links)
Automatic malware analysis is an essential part of today's computer security practices. Nearly one million malware samples were delivered to the analysts on a daily basis on year 2014 alone while the number of samples submitted for analysis increases almost exponentially each year. Given the size of the threat we are facing today and the amount of malicious codes emerging every day, the ability to automatically analyze unknown and unwanted software is critically important more than ever. On the other hand, malware writers adapt their malicious codes to new security measurements to protect them from being exposed and detected. This is usually achieved by employing obfuscation techniques that complicate the reverse engineering and analysis of the code by adding lots of unnecessary and irrelevant computations. Most of the malicious samples found in the wild are obfuscated and equipped with complicated anti-analysis defenses intended to hide the malicious intent of the malware by defeating the analysis and/or increasing the analysis time. Deobfuscation (reversing the obfuscation) requires automatic techniques to extract the original logic embedded in the obfuscated code for further analysis. Presumably the deobfuscated code requires less analysis time and is easier to analyze compared to the obfuscated one. Previous approaches in this regard target specific types of obfuscations by making strong assumptions about the underlying protection scheme leaving opportunities for the adversaries to attack. This work addresses this limitation by proposing new program analysis techniques that are effective against code obfuscations while being generic by minimizing the assumptions about the underlying code. We found that standard program analysis techniques, including well-known data and control flow analyses and/or symbolic execution, suffer from imprecision due to the obfuscation and show how to mitigate this loss of precision. Using more precise program analysis techniques, we propose a deobfuscation technique that is successful in reversing the complex obfuscation techniques such as virtualization-obfuscation and/or Return-Oriented Programming (ROP).
|
7 |
Retrowrite: Statically Instrumenting COTS Binaries for Fuzzing and SanitizationSushant Dinesh (6640856) 10 June 2019 (has links)
<div>End users of closed-source software currently cannot easily analyze the security</div><div>of programs or patch them if flaws are found. Notably, end users can include devel</div><div>opers who use third party libraries. The current state of the art for coverage-guided</div><div>binary fuzzing or binary sanitization is dynamic binary translation, which results</div><div>in prohibitive overhead. Existing static rewriting techniques cannot fully recover</div><div>symbolization information, and so have difficulty modifying binaries to track code</div><div>coverage for fuzzing or add security checks for sanitizers.</div><div>The ideal solution for adding instrumentation is a static rewriter that can intel</div><div>ligently add in the required instrumentation as if it were inserted at compile time.</div><div>This requires analysis to statically disambiguate between references and scalars, a</div><div>problem known to be undecidable in the general case. We show that recovering this</div><div>information is possible in practice for the most common class of software and li</div><div>braries: 64 bit, position independent code. Based on our observation, we design a</div><div>binary-rewriting instrumentation to support American Fuzzy Lop (AFL) and Address</div><div>Sanitizer (ASan), and show that we achieve compiler levels of performance, while re</div><div>taining precision. Binaries rewritten for coverage-guided fuzzing using RetroWrite</div><div>are identical in performance to compiler-instrumented binaries and outperforms the</div><div>default QEMU-based instrumentation by 7.5x while triggering more bugs. Our im</div><div>plementation of binary-only Address Sanitizer is 3x faster than Valgrind memcheck,</div><div>the state-of-the-art binary-only memory checker, and detects 80% more bugs in our</div><div>security evaluation.</div>
|
8 |
INFERENCE OF RESIDUAL ATTACK SURFACE UNDER MITIGATIONSKyriakos K Ispoglou (6632954) 14 May 2019 (has links)
<div>Despite the broad diversity of attacks and the many different ways an adversary can exploit a system, each attack can be divided into different phases. These phases include the discovery of a vulnerability in the system, its exploitation and the achieving persistence on the compromised system for (potential) further compromise and future access. Determining the exploitability of a system –and hence the success of an attack– remains a challenging, manual task. Not only because the problem cannot be formally defined but also because advanced protections and mitigations further complicate the analysis and hence, raise the bar for any successful attack. Nevertheless, it is still possible for an attacker to circumvent all of the existing defenses –under certain circumstances.</div><div><br></div><div>In this dissertation, we define and infer the Residual Attack Surface on a system. That is, we expose the limitations of the state-of-the-art mitigations, by showing practical ways to circumvent them. This work is divided into four parts. It assumes an attack with three phases and proposes new techniques to infer the Residual Attack Surface on each stage.</div><div><br></div><div>For the first part, we focus on the vulnerability discovery. We propose FuzzGen, a tool for automatically generating fuzzer stubs for libraries. The synthesized fuzzers are target specific, thus resulting in high code coverage. This enables developers to expose and fix vulnerabilities (that reside deep in the code and require initializing a complex state to trigger them), before they can be exploited. We then move to the vulnerability exploitation part and we present a novel technique called Block Oriented Programming (BOP), that automates data-only attacks. Data-only attacks defeat advanced control-flow hijacking defenses such as Control Flow Integrity. Our framework, called BOPC, maps arbitrary exploit payloads into execution traces and encodes them as a set of memory writes. Therefore an attacker’s intended execution “sticks” to the execution flow of the underlying binary and never departs from it. In the third part of the dissertation, we present an extension of BOPC that presents some measurements that give strong indications of what types of exploit payloads are not possible to execute. Therefore, BOPC enables developers to test what data an attacker would compromise and enables evaluation of the Residual Attack Surface to assess an application’s risk. Finally, for the last part, which is to achieve persistence on the compromised system, we present a new technique to construct arbitrary malware that evades current dynamic and behavioral analysis. The desired malware is split into hundreds (or thousands) of little pieces and each piece is injected into a different process. A special emulator coordinates and synchronizes the execution of all individual pieces, thus achieving a “distributed execution” under multiple address spaces. malWASH highlights weaknesses of current dynamic and behavioral analysis schemes and argues for full-system provenance.</div><div><br></div><div>Our envision is to expose all the weaknesses of the deployed mitigations, protections and defenses through the Residual Attack Surface. That way, we can help the research community to reinforce the existing defenses, or come up with new, more effective ones.</div>
|
9 |
Development of a prototype taint tracing tool for security and other purposesKargén, Ulf January 2012 (has links)
In recent years there has been an increasing interest in dynamic taint tracing of compiled software as a powerful analysis method for security and other purposes. Most existing approaches are highly application specific and tends to sacrifice precision in favor of performance. In this thesis project a generic taint tracing tool has been developed that can deliver high precision taint information. By allowing an arbitrary number of taint labels to be stored for every tainted byte, accurate taint propagation can be achieved for values that are derived from multiple input bytes. The tool has been developed for x86 Linux systems using the dynamic binary instrumentation framework Valgrind. The basic theory of taint tracing and multi-label taint propagation is discussed, as well as the main concepts of implementing a taint tracing tool using dynamic binary instrumentation. The impact of multi-label taint propagation on performance and precision is evaluated. While multi-label taint propagation has a considerable impact on performance, experiments carried out using the tool show that large amounts of taint information is lost with approximate methods using only one label per tainted byte.
|
10 |
Compositional Decompilation using LLVM IREklind, Robin January 2015 (has links)
Decompilation or reverse compilation is the process of translating low-level machine-readable code into high-level human-readable code. The problem is non-trivial due to the amount of information lost during compilation, but it can be divided into several smaller problems which may be solved independently. This report explores the feasibility of composing a decompilation pipeline from independent components, and the potential of exposing those components to the end-user. The components of the decompilation pipeline are conceptually grouped into three modules. Firstly, the front-end translates a source language (e.g. x86 assembly) into LLVM IR; a platform-independent low-level intermediate representation. Secondly, the middle-end structures the LLVM IR by identifying high-level control flow primitives (e.g. pre-test loops, 2-way conditionals). Lastly, the back-end translates the structured LLVM IR into a high-level target programming language (e.g. Go). The control flow analysis stage of the middle-end uses subgraph isomorphism search algorithms to locate control flow primitives in CFGs, both of which are described using Graphviz DOT files. The decompilation pipeline has been proven capable of recovering nested pre-test and post-test loops (e.g. while, do-while), and 1-way and 2-way conditionals (e.g. if, if-else) from LLVM IR. Furthermore, the data-driven design of the control flow analysis stage facilitates extensions to identify new control flow primitives. There is huge potential for future development. The Go output could be made more idiomatic by extending the post-processing stage, using components such as Grind by Russ Cox which moves variable declarations closer to their usage. The language-agnostic aspects of the design will be validated by implementing components in other languages; e.g. data flow analysis in Haskell. Additional back-ends (e.g. Python output) will be implemented to verify that the general decompilation tasks (e.g. control flow analysis, data flow analysis) are handled by the middle-end. / <p>BSc dissertation written during an ERASMUS exchange from Uppsala University to the University of Portsmouth.</p>
|
Page generated in 0.0456 seconds