Behavioral Red Teaming: Investigating Future Biosecurity Risk from Agentic AI and De Novo Sequence Design
Keywords: AI safety, agentic misalignment, biosecurity, red teaming, biological design tools, AI agents, constraint-based evaluation, large language models
TL;DR: We present the first empirical evidence of AI agents (powered by Claude Sonnet-4 and GPT-4o) electing to develop biological agents against humans in a simulated crisis scenario when conventional solutions are not viable.
Abstract: The advent of AI agents for science, biological design tools (BDTs), and lab automation technology holds great promise to revolutionize biology. However, the convergence of these technologies also creates profound biosecurity risks – the automated development of de novo biological agents – that current sequence homology and slow, human expert review-based screening systems are ill-positioned to address. While most work thus far has presumed the existence of malicious human actors in exploiting this autonomous R\&D loop, we specifically focus on agentic misalignment in biosecurity-relevant contexts. Through a novel red teaming technique designed to screen agents for autonomous, concerning behaviors in real-world deployment contexts, we – to our knowledge – present the first empirical evidence of AI agents (powered by Claude Sonnet-4 and GPT-4o, with tool access) electing to develop and deploy harmful biological agents against humans in a simulated crisis scenario.
Submission Number: 41
Loading