Biorisk-Shift: Converting AI Vulnerabilities into Biological Threat Vectors

Published: 15 Oct 2025, Last Modified: 24 Nov 2025BioSafe GenAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: biosafety guardrails, AI Safety, Jailbreak attacks, datasets, multi-turn jailbreaks
TL;DR: This paper presents a crowdsourced multi-turn jailbreak dataset and transformation protocols revealing how conversational failures in frontier language models can escalate into high-stakes biorisk, bypassing biosafety safeguards by over 50%.
Abstract: Generative AI models are increasingly applied in biotechnological domains, yet existing safeguards fail to account for how diverse conversational failure modes can be repurposed into biorisk-relevant attacks. We present a crowdsourced, multi-turn jailbreak dataset collected from non-technical university students who achieved effective red teaming capabilities with 5+ hours of learning, plus transformation protocols that systematically identify safety vulnerabilities in frontier language models. Central to our study is a domain-targeted Biorisk-Shift transformation leveraging this dataset, which successfully converts general jailbreak patterns into high-stakes biological contexts with a 53.5\% bypass rate of biosafety guardrails. Complementary transformations, including Attack Enhancement and Failure Root-Cause Iteration, further expand the range of elicited harmful outputs. Benchmarks against defense-filtered models show that even state-of-the-art safeguards can be circumvented, underscoring how ordinary conversational exploits can escalate into risks for protein design, genome editing, and molecular synthesis. Our findings demonstrate the need for comprehensive biosecurity-specific evaluation methods that involve a large contributor base and integrated safeguards that directly address the translation of everyday model failures into extreme biological threats.
Submission Number: 15
Loading