LLM Novice Uplift on Dual-Use, In Silico Biology Tasks: A Multi-Benchmark Assessment

Published: 01 Mar 2026, Last Modified: 03 Mar 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0
Keywords: AI Safety, Large Language Models, Biosecurity, Uplift
TL;DR: We find that novices with access to frontier LLMs are 4.16× more accurate on dual-use biosecurity tasks than those with only internet access, providing the first large-scale empirical measurement of LLM-enabled uplift in this domain.
Abstract: Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they \textit{uplift} novice users---i.e., enable humans to perform better than with internet-only resources. This uncertainty is central to understanding both scientific acceleration and dual-use risk. We conducted a multi-model, multi-benchmark human uplift study comparing novices with LLM access versus internet-only access across eight biosecurity-relevant task sets. Participants worked on complex problems with ample time (up to 13 hours for the most involved tasks). We found that LLM access provided substantial uplift: novices with LLMs were $4.16\times$ more accurate than controls (95% CI $[2.63, 6.87]$). On four benchmarks with available expert baselines (internet-only), novices with LLMs outperformed experts on three of them. Perhaps surprisingly, standalone LLMs often exceeded LLM-assisted novices, indicating that users were not eliciting the strongest available contributions from the LLMs. Most participants (89.6%) reported little difficulty obtaining dual-use-relevant information despite safeguards. Overall, LLMs substantially uplift novices on biological tasks previously reserved for trained practitioners, underscoring the need for sustained, interactive uplift evaluations alongside traditional benchmarks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 80
Loading