Track: Track 1: Original Research/Position/Education/Attention Track
Abstract: AI-driven materials discovery has made remarkable strides in generating stable, unique, and novel structures at unprecedented scale, yet a critical gap remains between generation and realization: without valid synthesis routes, proposed materials stay in silico. Existing approaches to precursor prediction (PP)---the first and most consequential step of material synthesis planning (MSP)---formulate it as a single-step problem, overlooking the well-established domain knowledge that solid-state reactions proceed pairwise, while relying on text-mined datasets known to suffer from extraction errors, chemical invalidity, and systematic compositional bias. We introduce Retro-Forge, a multi-step pairwise retrosynthesis framework that for the first time casts PP as a sequence of learnable pairwise reactions. Built on a chemically valid Pairwise Reaction Dataset (PRD, $\sim$6k entries) and expanded through synthetic data augmentation to address data scarcity, a single-step pairwise reactant prediction model is trained and composed recursively via tree search to produce complete synthesis routes. Retro-Forge matches state-of-the-art PP baselines under a contamination-free evaluation protocol, demonstrating that the multi-step pairwise formulation is both learnable and effective---and that further advances in pairwise reactant prediction will directly
translate to improved synthesis route discovery.
Keywords: Precursor Prediction, Pairwise Reaction, Retrosynthesis, Data Augmentation, Material Science
Submission Number: 78
Loading