With the rise of development efforts, there has also been a rise in source code vulnerabilities.
Advanced security tools have been created to identify these vulnerabilities throughout the lifetime of the developer's ecosystem and afterward, before the vulnerabilities are exposed.
One such popular method is Static Code Analysis (Code Analysis) (SCA), which scans developers' source code to identify potential vulnerabilities in the code. My Ph.D. work aims to help reduce the vulnerabilities exposed by YIELD, ENHANCE, and EVALUATE (EYE) SCA tools to identify vulnerabilities while the developer writes the code. We first look into evaluating tools that support developers with their source code by determining how accurate they are with identifying vulnerability information. Large Language Machine Learning Model (LLM)s have been on the rise recently with the introduction of Chat Generative Pre-trained Transformer (ChatGPT) 3.5, ChatGPT 4.1, Google Gemini, and many more.
Using a common framework, we created a zero-shot prompt instructing the LLM to identify; whether there is a vulnerability in the provided source code and what Common Weakness Enumeration (CWE) value represents the vulnerability. With our Python cryptographic benchmark PyCryptoBench, we sent vulnerable samples to four different LLMs and two different versions of ChatGPT Application Program Interface (API)s. The samples allow us to measure how reliable each LLM is at vulnerability identification and defining. The Chat- GPT APIs include multiple reproducible fields that allowed us to measure how reproducible the responses are. Next, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. Cryptolation, our state-of-the-art (SOA) Python SCA tool uses constant propagation-supported variable inference to obtain insight into the data flow state through the program's execution. Python source code has ever-increasing complexities and a lack of SCA tools compared to Java. We compare Cryptolation with the other SOA SCA tools Bandit, Semgrep, and Dlint. To verify the Precision of our tool, we created the benchmark PyCryptoBench, which contains 1,836 test cases and encompasses five different language features. Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and sampled over 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Finally, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we examine several developer surveys to determine their reasons for not adopting such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently.
To achieve this, we enhance the SOA Java SCA tool CryptoGuard with the following: integrated build tools, modern terminal Command Line Interface (CLI) usage, customizable and vendor-specific output formats, and no-install demos. / Doctor of Philosophy / With the rise of more development efforts and source codes, there has also been a rise in source code vulnerabilities. More advanced security tools have been created to identify these vulnerabilities before they are exposed to match this. SCA are a popular method for identifying vulnerable source code since they do not execute any code and can scan the code while the developer is writing it. Despite their popularity, there is still much room for improvement.
My Ph.D. work aims to help reduce the vulnerabilities exposed by EYE SCA tools to identify vulnerabilities while the developer writes the code. First, we look into evaluating tools that support and refine SCA by examining the Accuracy and secureness of generative LLMs. LLM have been on the rise recently with the introduction of ChatGPT 3.5 and, more recently, ChatGPT 4.1. ChatGPT is a conversation-based program in which you ask the program a question, and it answers the question. This can explain small source code snippets to developers, provide source code examples, or even fix source code. While the developers of the LLMs have restricted certain aspects of the models, one of their main selling points is their source code assistance. With over 1,000 zero-shot prompts, we measure how accurate and reliable LLMs are in identifying the existence and information of vulnerabilities within the source code. First, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. This tool is Cryptolation, a Python SCA tool that uses variable inference to try to determine the variable values without execution. Python source code has ever-increasing complexities and a lack of tools compared to Java. We compare Cryptolation with four other SOA tools. To verify the Precision of our tool, we create the benchmark PyCryptoBench, over 1,000 test cases encompassing five different language features.
Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and samples of the 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Next, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we investigated developers' current surveys to see what they identified as reasons not to adopt such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently. To achieve this, we enhance the SOA Java SCA tool CryptoGuardto address these adequately.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/121100 |
Date | 09 September 2024 |
Creators | Frantz, Miles Eugene |
Contributors | Computer Science and#38; Applications, Yao, Danfeng, Rajagopalan, Raj, Meng, Na, Chung, Taejoong Tijay, Brown, Dwayne Christian |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0035 seconds