Explaining poor performance of text-based machine learning models for vulnerability detection

Kollin Napier; Tanmay Bhowmik; Zhiqian Chen

Explaining poor performance of text-based machine learning models for vulnerability detection

Kollin Napier, Tanmay Bhowmik, Zhiqian Chen

Published: 01 Jan 2024, Last Modified: 07 Oct 2024Empir. Softw. Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With an increase of severity in software vulnerabilities, machine learning models are being adopted to combat this threat. Given the possibilities towards usage of such models, research in this area has introduced various approaches. Although models may differ in performance, there is an overall lack of explainability in understanding how a model learns and predicts. Furthermore, recent research suggests that models perform poorly in detecting vulnerabilities when interpreting source code as text, known as “text-based” models. To help explain this poor performance, we explore the dimensions of explainability. From recent studies on text-based models, we experiment with removal of overlapping features present in training and testing datasets, deemed “cross-cutting”. We conduct scenario experiments removing such “cross-cutting” data and reassessing model performance. Based on the results, we examine how removal of these “cross-cutting” features may affect model performance. Our results show that removal of “cross-cutting” features may provide greater performance of models in general, thus leading to explainable dimensions regarding data dependency and agnostic models. Overall, we conclude that model performance can be improved, and explainable aspects of such models can be identified via empirical analysis of the models’ performance.

Loading