Can LLMs Patch Security Issues?

ACL ARR 2024 June Submission4631 Authors

16 Jun 2024 (modified: 08 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have shown impressive proficiency in code generation. Unfortunately, these models share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. These vulnerabilities could allow unauthorized attackers to access sensitive data or systems, which is unacceptable for safety-critical applications. %In this paper, we evaluate LLMs' ability to generate vulnerable code on existing datasets and approaches, discuss the limitations, and propose a new dataset and novel approach to address these limitations. We propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code. Our approach leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities. We address the research community’s needs for safe code generation by introducing a large-scale dataset, PythonSecurityEval, covering the diversity of real-world applications, including databases, websites and operating systems. We empirically validate that FDSP outperforms prior work that uses self-feedback from LLMs by up to 17.6\% through our procedure that injects targeted, external feedback. Code and data are attached.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Code Generation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 4631
Loading