MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

ICLR 2026 Conference Submission13274 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, reasoning, chemistry, science, ai4science, reinforcement learning

TL;DR: LLMs need certain requisites before RL fine-tuning to succeed at chemical reasoning. Pretraining targeting symbolic competence and latent domain knowledge are key to unlocking reasoning abilities

Abstract: Large Language Models (LLMs) acquire emergent reasoning capabilities when fine-tuned in an online setting with simple rule-based rewards. Recent studies, however, indicate that success in this regard is conditioned on the latent solvability of tasks in the base LLM: RL can only amplify answers to which the base model already assigns non-negligible probabilities. This work investigates the emergence of chemical reasoning capabilities and what these prerequisites mean for chemistry. We identify two necessary conditions for RL-based chemical reasoning: 1) Symbolic competence, and 2) Latent domain knowledge. We propose MiST: a set of mid-stage training techniques to satisfy these, including data-mixing with SMILES-aware preprocessing and continued pre-training on a rich data mixture of 2.9B tokens. These steps raise the latent-solvability score on IUPAC to SMILES translation by 2x and enable RL to lift top-1 accuracy on reaction prediction from 4.1% to 25.2% on challenging chemical tasks, while producing faithful reasoning traces. Our results define clear prerequisites for chemical reasoning training and highlight the broader role of mid-stage pre-training in unlocking reasoning capabilities.

Supplementary Material: pdf

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 13274

Loading