Return to search

Neural Network-based Methodologies for Securing Cryptographic Code

Many studies show that manual code generation is error-prone and results in vulnerabilities. Vulnerability fixing has been shown as the most time-consuming process among multiple steps of code repair. To help developers repair these security vulnerabilities, my dissertation aims to develop an automatic or semi-automatic secure code generation system with neural network based approaches. Trained with huge amounts of good-quality code, I expect the neural network to learn the secure usage and produce the correct code suggestions.

Despite the great success of neural networks, the vision of comprehending and generating programming languages through neural networks has not been fully realized. There are many fundamental questions that need to be answered. These questions include 1) what are the accuracy impacts of the various choices in code embedding? 2) How to address the accuracy challenges caused by the programming language specific properties in the task of secure code suggestion? My dissertation work answers the two questions with a systematical measurement study and specialized neural network designs. My experiments show that program analysis is a necessary preprocessing step to guide the code embedding – resulting in a 36.1% accuracy improvement. Furthermore, I identify two previously unreported deficiencies in the cryptographic API suggestion task. To close the gap, I invent a highly accurate API method suggestion solution, referred to as Multi-HyLSTM, with specialized neural network designs to recognize unique programming language characteristics. My work points out the important differences between natural languages and programming languages, which pure data-driven learning approaches may not recognize. / Doctor of Philosophy / Neural network techniques that automatically learn rules from data show great potential to provide vulnerability-agnostic solutions for securing code. Recent research community has witnessed the rapid progress of neural network techniques in various application domains, such as computer vision, natural language processing, etc. However, how to harness the success of neural network based approaches for dealing with programs is still largely unknown. Many fundamental questions are required to be answered. This dissertation aims to provide neural network based solutions to help developers write secure code, as well as answer several important but unknown research questions about promoting neural network based approaches specialized for the programming language domain. Learning from Java cryptographic code, I explore the accuracy challenges for neural networks to understand the secure API usage rules and generate appropriate suggestions based on them. One of my research focuses is on how to express code in a way that neural networks can comprehend, aka code embedding. Code embedding is the process of transforming code into numeric vectors. It is important for accuracy as all the subsequent neural network calculation is performed on it. I conduct a systematic comparison to evaluate several key embedding design choices and reveal their impacts on accuracy improvements. To further improve the accuracy, I focus on the accuracy challenges in the specific task, generating API suggestions by neural networks. I identify the unreported program dependency specific challenges and present several specialized neural network designs to address them.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/111543
Date17 August 2022
CreatorsXiao, Ya
ContributorsComputer Science and Applications, Yao, Danfeng, Hicks, Matthew, Ge, Xinyang, Ramakrishnan, Narendran, McDaniel, Patrick Drew
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0024 seconds