MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models
Keywords: LLM, reasoning, chemistry, science, ai4science, reinforcement learning
TL;DR: LLMs need certain requisites before RL fine-tuning to succeed at chemical reasoning. Pretraining targeting symbolic competence and latent domain knowledge are key to unlocking reasoning abilities
Abstract: Large Language Models (LLMs) acquire emergent reasoning capabilities when fine-tuned in an online setting with simple rule-based rewards. Recent studies, however, indicate that success in this regard is conditioned on the latent solvability of tasks in the base LLM: RL can only amplify answers to which the base model already assigns non-negligible probabilities. This work investigates the emergence of chemical reasoning capabilities and what these prerequisites mean for chemistry. We identify two necessary conditions for RL-based chemical reasoning: 1) Symbolic competence, and 2) Latent domain knowledge. We propose MiST: a set of mid-stage training techniques to satisfy these, including data-mixing with SMILES-aware preprocessing and continued pre-training on a rich data mixture of 2.9B tokens. These steps raise the latent-solvability score on IUPAC to SMILES translation by 2x and enable RL to lift top-1 accuracy on reaction prediction from 4.1% to 25.2% on challenging chemical tasks, while producing faithful reasoning traces. Our results define clear prerequisites for chemical reasoning training and highlight the broader role of mid-stage pre-training in unlocking reasoning capabilities.
Supplementary Material: pdf
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 13274
Loading