BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Adibvafa Fallahpour; Andrew Magnuson; Purav Gupta; Shihao Ma; Jack Naimer; Arnav Shah; Haonan Duan; Omar Ibrahim; Hani Goodarzi; Chris J. Maddison; BO WANG

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Adibvafa Fallahpour, Andrew Magnuson, Purav Gupta, Shihao Ma, Jack Naimer, Arnav Shah, Haonan Duan, Omar Ibrahim, Hani Goodarzi, Chris J. Maddison, BO WANG

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Biological Reasoning, DNA Foundation Models, Large Language Models (LLMs), Reinforcement Learning, Multimodal

TL;DR: BioReason introduces a novel DNA-LLM architecture where the LLM directly processes genomic information, achieving superior, interpretable multi-step biological reasoning and accelerating mechanistic discovery.

Abstract: Unlocking deep and interpretable biological reasoning from complex genomic data remains a major AI challenge limiting scientific progress. While current DNA foundation models excel at representing sequences, they struggle with multi-step reasoning and lack transparent, biologically meaningful explanations. BioReason addresses this by tightly integrating a DNA foundation model with a large language model (LLM), enabling the LLM to directly interpret and reason over genomic information. Through supervised fine-tuning and reinforcement learning, BioReason learns to produce logical, biologically coherent deductions. It achieves major performance gains, boosting KEGG-based disease pathway prediction accuracy from 86% to 98% and improving variant effect prediction by an average of 15% over strong baselines. BioReason can reason over unseen biological entities and explain its decisions step by step, offering a transformative framework for interpretable, mechanistic AI in biology. All data, code, and checkpoints are available at [https://github.com/bowang-lab/BioReason](https://github.com/bowang-lab/BioReason).

Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)

Submission Number: 18371

Loading