How Far Have We Gone in Vulnerability Detection Using Large Language Model

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LLM, Vulnerability, Security
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In an era where software grows increasingly complex and suffers from vulnerabilities, automated vulnerability detection is vitally important yet remains a challenging task. The remarkable generalizability exhibited by Large Language Models (LLMs) across various domains heightens our anticipation of their capabilities in vulnerability detection. Still, the lack of quantitative performance measurements hinders a clear understanding of these models' potential. Addressing this, we present \dsname, a high-quality, comprehensive vulnerability benchmark. \dsname has amassed high-quality vulnerability data derived from an extensive array of CTF challenges and real-world applications. This benchmark annotates each vulnerable function, specifying the vulnerability type and root cause of the vulnerability. We conduct extensive experiments involving existing solutions, assessing a total of 16 LLMs and 6 state-of-the-art (SOTA) methods in vulnerability detection. The evaluation result uncovers a paradox in performance levels and highlights the untapped potential of LLMs. Our work makes a significant advancement toward understanding and harnessing the power of LLMs for more secure software systems.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1301
Loading