Keywords: Code Language Model; LLMs; Red-Teaming
Abstract: We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in two stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs.
Across two major evaluation domains, ASTRA identifies 11--66\% more issues than existing techniques and generates test cases
, securing the winning red-team solution in the Amazon Nova AI Challenge 2025. In broader evaluations across nine leading open-source and commercial LLMs, including GPT-5 and Claude-4-Sonnet, ASTRA achieves 63.43\% and 70.46\% attack success rates on security event guidance and secure code generation, respectively---demonstrating its practical value for building safer AI systems.
Submission Number: 108
Loading