Is Vibe Coding Safe? Benchmarking Vulnerability of Agent Generated Code in Real-World Tasks

Is Vibe Coding Safe? Benchmarking Vulnerability of Agent Generated Code in Real-World Tasks

ICLR 2026 Conference Submission21910 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vibe Coding, Code Security, Agentic System

Abstract: Vibe coding, the practice of letting LLM agents complete complex coding taskswith little human supervision, is increasingly used by engineers, especially beginners. However, is it really safe when the human engineers may have no abilityor intent to examine its outputs? We propose SUSVIBES, a benchmark consistingof 200 software engineering tasks from real-world open-source projects, which,when given to human programmers, led to vulnerable implementations. Whenfaced with these tasks, widely adopted open-source coding agents with strongfrontier models perform terribly in terms of security. Although 47.5% of the tasksperformed by Claude 4 Sonnet are functionally correct, only 8.25% are secure.Further experiments suggest that inference scaling and LLM-as-a-judge mitigatethe issue to some extent, but do not fully address it. Our findings raise seriousconcerns about the widespread adoption of vibe-coding, particularly in securitysensitive applications.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21910

Loading