ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Xiangzhe Xu; Guangyu Shen; Zian Su; Siyuan Cheng; Hanxi Guo; Lu Yan; Xuan Chen; Jiasheng Jiang; Xiaolong Jin; Chengpeng Wang; ZHUO ZHANG; Xiangyu Zhang

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Xiangzhe Xu, Guangyu Shen, Zian Su, Siyuan Cheng, Hanxi Guo, Lu Yan, Xuan Chen, Jiasheng Jiang, Xiaolong Jin, Chengpeng Wang, ZHUO ZHANG, Xiangyu Zhang

Published: 08 Nov 2025, Last Modified: 08 Nov 2025ResponsibleFM @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Code Language Model; LLMs; Red-Teaming

Abstract: We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in two stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs. Across two major evaluation domains, ASTRA identifies 11--66\% more issues than existing techniques and generates test cases , securing the winning red-team solution in the Amazon Nova AI Challenge 2025. In broader evaluations across nine leading open-source and commercial LLMs, including GPT-5 and Claude-4-Sonnet, ASTRA achieves 63.43\% and 70.46\% attack success rates on security event guidance and secure code generation, respectively---demonstrating its practical value for building safer AI systems.

Submission Number: 108

Loading