MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

ICML 2025 Workshop FM4LS Submission41 Authors

Published: 12 Jul 2025, Last Modified: 12 Jul 2025FM4LS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, reasoning, chemistry, science, ai4science, reinforcement learning, drug discovery
TL;DR: LLMs need certain requisites before RL fine-tuning to succeed at chemical reasoning. Pretraining targeting symbolic competence and latent domain knowledge are key to unlocking reasoning abilities.
Abstract: Large Language Models (LLMs) can acquire emergent reasoning via online fine-tuning with simple rule-based rewards when tasks are already latent-solvable by the base model. We study chemical reasoning and identify two pre-requisites for RL-based training: (1) symbolic competence and (2) latent domain knowledge. We introduce MiST, a mid-stage training protocol using SMILES-aware data mix and continued pre-training on 2.9 B tokens, and supervised finetuning on chain-of-thoughts reasoning data. MiST doubles the latent-solvability score of IUPAC-to-SMILES translation and increases the precision of reaction prediction from 4.1% to 25.2%, while producing faithful reasoning traces. Our work defines clear prerequisites for chemical reasoning and under-scores the value of mid-stage pre-training.
Submission Number: 41
Loading