rbio1 - training scientific reasoning LLMs with biological world models as soft verifiers

rbio1 - training scientific reasoning LLMs with biological world models as soft verifiers

ICLR 2026 Conference Submission21459 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning models, virtual cell models, transcriptomics, scientific reasoning

TL;DR: Training scientific reasoning LLMs with biological world models as soft verifiers

Abstract: Reasoning models are typically trained against verification mechanisms in formally specified systems such as code or symbolic math. In open domains like biology, however, we lack exact rules to enable large-scale formal verification and instead often rely on lab experiments to test predictions. Such experiments are slow, costly, and cannot scale with computation. In this work, we show that world models of biology or other prior knowledge can serve as approximate oracles for soft verification, allowing reasoning systems to be trained without additional experimental data. We present two paradigms of training models with approximate verifiers: RLEMF: reinforcement learning with experimental model feedback and RLPK: reinforcement learning from prior knowledge. Using these paradigms, we introduce rbio1, a reasoning model for biology post-trained from a pretrained LLM with reinforcement learning, using learned biological models for verification during training. We demonstrate that soft verification can distill biological world models into rbio1, enabling it to achieve state-of-the-art performance on perturbation prediction in the PerturbQA benchmark. We present rbio1 as a proof of concept that predictions from biological models can train powerful reasoning systems using simulations rather than experimental data, offering a new paradigm for model training.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 21459

Loading