Keywords: Chemical Reasoning LLM, Computational Chemistry, Code-Augmented Reasoning, Reinforcement Learning
TL;DR: We present ChemReason, a code-driven reasoning language model with dynamic chemical verification, enabling verifiable molecular generation and optimization and achieving state-of-the-art performance on open molecular tasks.
Abstract: In chemistry, most research on large language models has centered on knowledge question answering and retrieval. However, these approaches fall short on core tasks such as open molecular generation and optimization: they lack explicit reasoning processes, and their outputs cannot be systematically verified, leading to severe issues of scientific hallucination. To overcome these limitations, we propose ChemReason, a chemical LLM grounded in generative code reasoning. Unlike non-reasoning LLMs, ChemReason is a code-driven reasoning model that provides transparent inference for molecular editing, generation, and optimization. By dynamically generating and executing chemical verification code during reasoning, the model validates each step, ensuring the scientific reliability of its answers under open-domain conditions. On the TOMG benchmark, ChemReason achieves state-of-the-art performance and demonstrates end-to-end verifiable inference for open molecular tasks. More broadly, the proposed "code–verification–reflection" paradigm offers an extensible pathway for AI for Science, providing a generalizable architecture for addressing complex scientific computing challenges.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 15402
Loading