• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

An analysis of text-based machine learning models for vulnerability detection

Napier, Kollin Ryne 12 May 2023 (has links) (PDF)
With an increase in complexity of software, developers rely more on reuse and dependencies in their source code via code snippets. As a result, it is becoming harder to identify and mitigate vulnerabilities. Although traditional analysis tools are still utilized, machine learning models are being adopted to expand efforts and combat such threats. Given the possibilities towards usage of such models, research in this area has introduced various approaches which vary in usability and prediction. In generalizing models to a more natural language approach, researchers have opted to train models on source code to identify existing and potential vulnerabilities. Exploratory research has been performed by treating source code as plain text, creating “text-based” models. With a motivation to prevent vulnerable code snippets, we present a dissertation on the effectiveness of text-based machine learning models for vulnerability detection. We utilize datasets composed of open-source projects and vulnerability types to generate our own training and testing data via extracted function pairings. Using this data, we evaluate a series of text-based machine learning models, coupled with natural language processing (NLP) techniques and our own data processing methods. Through empirical research, we demonstrate the effectiveness of such models based on statistical evidence. From these results, we determine negative correlations and identify "cross-cutting" features. Finally, we present analysis of models with "cross-cutting" feature removal to improve performance while providing explainability towards model decisions.

Page generated in 0.0402 seconds