Is My Language Model a Biohazard?

Published: 15 Oct 2025, Last Modified: 24 Nov 2025BioSafe GenAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Language Models, AI Safety, Biosecurity, NLP
TL;DR: We systematically tested twelve open-weight language models and discovered they all have an intrinsic bias towards better knowledge of toxic chemicals compared to non-toxic ones, which highlights a fundamental biosecurity vulnerability.
Abstract: The dual-use potential of language models in the chemical sciences presents a significant biosecurity challenge. We investigate a foundational aspect of this risk: whether LMs possess an intrinsic knowledge bias that favors toxic compounds over non-toxic ones. To address this, we systematically audit the latent chemical knowledge of twelve open-weight language models. We measure per-compound perplexity across a balanced dataset of 2,000 chemicals, comprising 1,000 toxic and 1,000 non-toxic compounds classified by the GHS08 "Health Hazard" standard. Our results reveal a consistent and statistically significant pattern: every model tested assigns lower perplexity, and therefore higher certainty, to the structures of toxic compounds. This finding demonstrates a systemic vulnerability across the current open-weight ecosystem, suggesting the risk is not merely a function of misuse but is embedded in the models' core knowledge. This intrinsic bias, possibly stemming from patterns in the training data, has profound implications for AI safety, as it may enhance model performance on a range of downstream tasks involving hazardous materials. Our work sheds light on this intrinsic vulnerability, and we make our code publicly available to support further research into this emergent risk.
Submission Number: 19
Loading