Return to search

Machine Learning-Based Automated Vulnerability Classification in C/C++ Software : The Future of Automated Software Vulnerability Classification

The degree of impact caused by software vulnerabilities is escalating as software systems become increasingly integrated into the everyday lives of human beings. Different methods, such as static and dynamic analysis, are commonly used to classify software vulnerabilities. However, these methods are often plagued by certain limitations, including high false positive and false negative rates. It is crucial to examine C/C++ software vulnerabilities, as C/C++ is widely implemented in many industries and critical infrastructures, where software vulnerabilities could have catastrophic consequences if exploited by malicious actors. This thesis examines the feasibility of utilizing machine learning-based models for automated C/C++ software vulnerability classification. Additionally, the effect of hyperparameter tuning on the predictive performance of the utilized models is explored. The models investigated were divided into two main groups, namely, traditional machine learning models and transformer-based models. All models were trained, evaluated, and compared using a large and diverse C/C++ dataset. The findings suggest that autoregressive large language models, particularly Llama 2 and Code Llama utilizing a decoder-only transformer architecture, demonstrate significant potential for accurate C/C++ vulnerability classification, achieving F1-scores of 0.912 and 0.905, respectively. The results further indicate that hyperparameter tuning has a limited positive effect on predictive performance. Moreover, specific traditional machine learning models, like the SVM model, outperformed many of the transformer-based models, potentially indicating limitations in training procedures and the architectures of many pre-trained language models. Nevertheless, autoregressive large language models exhibit significant potential for precise C/C++ software vulnerability classification and should remain a focal point for future research.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-204019
Date January 2024
CreatorsFazeli, Artin
PublisherLinköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds