Abstract

Translate

Automated techniques for detecting software vulnerabilities are necessary for developing secure systems. While deep learning approaches have been designed to address these issues, many focus solely on source and binary versions of code, ignoring intermediate representations. Models exist that evaluate images created from code, yet they fail to provide multiclass classification of vulnerabilities, which is necessary for developers to address specific insecurities. This research seeks to fill this gap by investigating deep learning approaches, performing classification of images generated from tokenized source code. To accomplish this, we performed a model-based formulative analysis of several models, comparing their accuracy using a PHP-based dataset of web-based vulnerabilities to suggest an optimal model for vulnerability detection. The research resulted in a process for creating images from PHP tokens and a ConvNext convolutional neural network that operated on ‘extended’ grayscale images of tokenized PHP source code. Our model achieved macro F1 scores of 0.958 and 0.962 in binary and multiclass classification, respectively; this approach outperformed existing models operating on the tokenized code of this same dataset. Ultimately, these results provide significant insight into novel approaches for future vulnerability detection.

Details

Title

Effectiveness of Image-Based Deep Learning on Token-Level Software Vulnerability Detection

Author

Johnson, Dylan Patrick

Publication year

2024

Publisher

ProQuest Dissertations & Theses

ISBN

9798382188287

Source type

Dissertation or Thesis

Language of publication

English

ProQuest document ID

3032983545

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Effectiveness of Image-Based Deep Learning on Token-Level Software Vulnerability Detection

Content area

Abstract

Details

Suggested sources