Global ETD Search

Return to search

Machine Learning-Based Automated Vulnerability Classification in C/C++ Software : The Future of Automated Software Vulnerability Classification

The degree of impact caused by software vulnerabilities is escalating as software systems become increasingly integrated into the everyday lives of human beings. Different methods, such as static and dynamic analysis, are commonly used to classify software vulnerabilities. However, these methods are often plagued by certain limitations, including high false positive and false negative rates. It is crucial to examine C/C++ software vulnerabilities, as C/C++ is widely implemented in many industries and critical infrastructures, where software vulnerabilities could have catastrophic consequences if exploited by malicious actors. This thesis examines the feasibility of utilizing machine learning-based models for automated C/C++ software vulnerability classification. Additionally, the effect of hyperparameter tuning on the predictive performance of the utilized models is explored. The models investigated were divided into two main groups, namely, traditional machine learning models and transformer-based models. All models were trained, evaluated, and compared using a large and diverse C/C++ dataset. The findings suggest that autoregressive large language models, particularly Llama 2 and Code Llama utilizing a decoder-only transformer architecture, demonstrate significant potential for accurate C/C++ vulnerability classification, achieving F1-scores of 0.912 and 0.905, respectively. The results further indicate that hyperparameter tuning has a limited positive effect on predictive performance. Moreover, specific traditional machine learning models, like the SVM model, outperformed many of the transformer-based models, potentially indicating limitations in training procedures and the architectures of many pre-trained language models. Nevertheless, autoregressive large language models exhibit significant potential for precise C/C++ software vulnerability classification and should remain a focal point for future research.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-204019

Natural Language Processing

Software Engineering

Programvaruteknik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-204019
Date	January 2024
Creators	Fazeli, Artin
Publisher	Linköpings universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Machine Learning-Based Automated Vulnerability Classification in C/C++ Software : The Future of Automated Software Vulnerability Classification

Description

Links & Downloads

Tags

Additional Fields