Statement-Level Adversarial Attack on Vulnerability Detection Models via Out-of-Distribution Features

Xiaohu Du; Ming Wen; Haoyu Wang; Zichao Wei; Hai Jin

Statement-Level Adversarial Attack on Vulnerability Detection Models via Out-of-Distribution Features

Xiaohu Du, Ming Wen, Haoyu Wang, Zichao Wei, Hai Jin

Published: 01 Jan 2025, Last Modified: 11 Aug 2025Proc. ACM Softw. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code vulnerability detection is crucial to ensure software security. Recent advancements, particularly with the emergence of Code Pre-Trained Models (CodePTMs) and Large Language Models (LLMs), have led to significant progress in this area. However, these models are easily susceptible to adversarial attacks, where even slight input modifications can lead the models to generate opposite results. Existing adversarial approaches, such as identifier replacement, code transformation, and dead code insertion, demonstrate promising performance but still face several limitations. First, the perturbations applied to the target code are relatively constrained (e.g., identifier replacement can only be applied to a small subset of tokens within the entire codebase). Second, the design of perturbed tokens lacks specificity in forcing the model to make incorrect predictions (e.g., they are generated by random selection or context-based prediction). Such limitations lead to the inefficiency and ineffectiveness of existing attacks. To address these issues, we propose SLODA (Statement-level OOD Features driven Adversarial Attack), which introduces two types of out-of-distribution (OOD) features: universal features via code deoptimization and label-specific features extracted from existing mispredicted and adversarial examples. These statement-level OOD features not only expand the perturbation scope, but can also significantly reduce the search space due to their inherently adversarial nature. Moreover, since the OOD features are extracted from existing code and the attack considers the context of the target code, they are more difficult to detect. Our extensive experiments across 15 models demonstrate that SLODA surpasses existing five state-of-the-art approaches in terms of the effectiveness, efficiency, and detection resistance. Furthermore, the adversarial examples generated by SLODA also exhibit promising performance to enhance model robustness.

Loading