BlueCodeAgent: A Blue Teaming Agent Powered by Automated Red Teaming for CodeGen AI

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0
Keywords: LLM, Agent, Code generation, safety, security
Abstract: Existing research on CodeGen AI security mainly focuses on red teaming, which aims to uncover vulnerabilities and risks in AI-generated code. However, progress on the blue teaming side remains limited, as effective defenses require a deep security analysis of given tasks and edge cases. To fill in this gap, we propose BlueCodeAgent, an end-to-end blue teaming agent system powered by automated red teaming. Our red teaming component generates diverse risky instances, providing effective edge cases and guidance for the subsequent blue teaming process. Our blue teaming agent then conducts multi-level defense, leveraging these red teaming examples to detect previously seen and unseen risk scenarios through constitution summarization and dynamic code analysis. Our evaluation across four representative code-related tasks–bias instruction detection, malicious instruction detection, vulnerable code detection, and prompt injection detection–shows that BlueCodeAgent achieves significant gains over diverse baselines. In particular, for vulnerability detection tasks, BlueCodeAgent integrates dynamic analysis to effectively reduce false positives, a challenging problem as base models tend to be over-conservative. Overall, with GPT-4o as the base model, BlueCodeAgent achieves an average F1 score improvement of 14.7% across four tasks compared to directly prompting the model, attributed to its ability to summarize actionable constitutions and perform dynamic analysis.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 29
Loading