Keywords: deep learning, security, vulnerability detection
TL;DR: In this paper, we present vulnerability detection as a challenging code reasoning task for Large Language Models.
Abstract: In this paper, we present a challenging code reasoning task: vulnerability detection.
Large Language Models (LLMs) have shown promising results in natural-language
and math reasoning, but state-of-the-art (SOTA) models reported only 54.5%
Balanced Accuracy in our vulnerability detection evaluation, even those models
pre-trained on large amounts of source code. Our error analysis on LLM responses
shows that the models struggle to reason about the code semantics relevant to
identifying vulnerabilities, especially subtle semantic differences caused by small
textual changes. We explored prominent models and training settings to understand
their effects on vulnerability detection performance — including better prompts,
larger models, more pre-training data, and fine-tuning — but none led to significant
improvements. This raises the question of whether simply scaling training data and
model size will allow us to “solve” complex code reasoning tasks like vulnerability
detection, or if a fundamental shift in modeling and training techniques is required.
We also explored adding domain knowledge to prompts; although it helped certain
models understand some code semantics, vulnerability detection requires multi-
step reasoning, and these models still failed in steps, such as reasoning about
variable relations. Our results suggest that new models, new training methods, or
more execution-specific pretraining data may be needed to conquer vulnerability
detection. We speculate that auto-regressive pre-training on source code may not
effectively extract code semantics, especially on the current pretraining mixtures,
in which execution data is scarce. Success on vulnerability detection as a code
reasoning task can benefit many areas of software engineering such as debugging,
test input generation, and program repair. Our code and data are available at
https://figshare.com/s/78fe02e56e09ec49300b.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8194
Loading