ProtGPT2 is Not Biosecure by Default

Published: 15 Oct 2025, Last Modified: 24 Nov 2025BioSafe GenAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: ProtGPT2, proteins, genAI, biosecurity, biosafety, red teaming, adversarial, screening, dual-use, ML security, AI safety, empirical, safeguards
TL;DR: We show ProtGPT2 is not biosecure by default through the first systematic input red-teaming study, and propose a lightweight screener to reduce its attack surface.
Abstract: Generative AI is accelerating protein design but most models lack safeguards against unsafe or adversarial use. We present the first systematic red-teaming of ProtGPT2, using a Black Box Labeling (BBL) framework to probe its attack surfaces. Across more than 200 input types and nearly 7,000 sequences, ProtGPT2 accepted everything—including toxins, code, and malformed strings—without validation. Many outputs resembled natural proteins, but others posed clear risks. Using the TrustToken framework, we show its tokenizer destabilizes under adversarial perturbations, inflating token lengths up to nine times more than NLP baselines. To mitigate this, we introduce ProtScreener, a lightweight filter that blocks malformed inputs and flags unstable cases while preserving benign outputs. Our findings show ProtGPT2 is not inherently biosecure and that layered safeguards and adversarially aware benchmarks are essential for responsible deployment of generative protein models.
Submission Number: 31
Loading