ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

ICLR 2026 Conference Submission15749 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, LLM for Chemistry, Reasoning LLM
TL;DR: We developed a chemical reasoning LLM, ChemDFM-R, under constrained resources with atomized chemical knowledge enhanced Domain Pre-training, Instruction Tuning, Mix-sourced Distillation, and Reinforcement Learning.
Abstract: While large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemical Reasoning LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized chemical knowledge, ChemFG, annotating the presence of functional groups in molecules and the changes of functional groups during chemical reactions, to enhance the model’s understanding of the fundamental principles and internal logic of chemistry. Then, we propose a mix-sourced distillation method that integrates expertise in atomized knowledge with general reasoning skills, followed by domain-specific reinforcement learning to enhance chemical reasoning. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs. Further case studies illustrate how explicit reasoning chains significantly improve the model's reliability, transparency, and practicality in real-world human-AI collaboration scenarios.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 15749
Loading