Keywords: Proteins, Diffusion, Biosafety, Erasing, Generation
TL;DR: SafeGenie is a weight-level probability editing framework that proactively erases unsafe protein concepts (prions, disulfide bridges, etc.) from diffusion models, enabling biosafe protein generation without sacrificing quality or diversity.
Abstract: Generative diffusion models have rapidly advanced protein design, but their flexibility introduces biosafety risks: the same models that scaffold therapeutic enzymes can also produce prions, toxins, or other harmful proteins. Post-hoc defenses like filters and classifiers are brittle and vulnerable to jailbreak-style prompting. We introduce SafeGenie, a weight-level erasure framework that reshapes the model’s probability distribution to proactively suppress unsafe concepts, making the resulting generators resilient to inference-time attacks. Through targeted experiments, we show that SafeGenie can reduce the likelihood of generating structural motifs such as $\alpha$-helices, eliminate prion-like aggregation signals, and lower toxic peptide predictions, all while preserving designability and diversity. We further construct a unified SafeGenie model by erasing 1,450 PDB-labeled toxins, demonstrating that large-scale distributional erasure yields a generator that reliably avoids unsafe sequences without degrading overall protein quality. Our results establish weight-space probability editing as a principled, robust, and practical tool for biosafety in generative biology.
Submission Number: 50
Loading