SafeGenie: Erasing Dangerous Concepts from Biological Diffusion Models

Arjun Banerjee; Ethan Tam; Camille Dang; David Martinez

SafeGenie: Erasing Dangerous Concepts from Biological Diffusion Models

Arjun Banerjee, Ethan Tam, Camille Dang, David Martinez

Published: 15 Oct 2025, Last Modified: 24 Nov 2025BioSafe GenAI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Proteins, Diffusion, Biosafety, Erasing, Generation

TL;DR: SafeGenie is a weight-level probability editing framework that proactively erases unsafe protein concepts (prions, disulfide bridges, etc.) from diffusion models, enabling biosafe protein generation without sacrificing quality or diversity.

Abstract: Generative diffusion models have rapidly advanced protein design, but their flexibility introduces biosafety risks: the same models that scaffold therapeutic enzymes can also produce prions, toxins, or other harmful proteins. Post-hoc defenses like filters and classifiers are brittle and vulnerable to jailbreak-style prompting. We introduce SafeGenie, a weight-level erasure framework that reshapes the model’s probability distribution to proactively suppress unsafe concepts, making the resulting generators resilient to inference-time attacks. Through targeted experiments, we show that SafeGenie can reduce the likelihood of generating structural motifs such as $\alpha$-helices, eliminate prion-like aggregation signals, and lower toxic peptide predictions, all while preserving designability and diversity. We further construct a unified SafeGenie model by erasing 1,450 PDB-labeled toxins, demonstrating that large-scale distributional erasure yields a generator that reliably avoids unsafe sequences without degrading overall protein quality. Our results establish weight-space probability editing as a principled, robust, and practical tool for biosafety in generative biology.

Submission Number: 50

Loading