Keywords: AI safety, biosecurity, evaluation, fine-tuning
TL;DR: We propose BioRiskEval, an evaluation framework that assess the dual-use risk for bio-foundation models. We show that data filtering may not fully prevent model from learning harmful knowledge.
Abstract: Bio-foundation models are inherently dual-use: they are increasingly helpful in research to discover new disease treatments, but could likewise be useful in an effort to develop new bioweapons. To mitigate the risks of these models, current approaches focus on filtering biohazardous data during pre-training, but the effectiveness of such an approach remains unclear---particularly against determined actors who might fine-tune these models for malicious use. To address this gap, we propose BioRiskEval, an approach to evaluate the robustness of procedures that are intended to reduce the dual-use capabilities of genome language models in sequence modeling, prediction of mutational effects, and prediction of virulence. Our results show that current filtering practices may not be particularly effective. Excluded knowledge can be rapidly recovered in some cases---like generalizing across species within the same genus.
Furthermore, dual-use signals may already reside in the pretrained representations, and can be elicited via simple linear probing.
These findings highlight the challenges of data filtering as a standalone procedure, underscoring the need for further research into robust safety and security strategies for open-weight bio-foundation models.
Submission Number: 27
Loading