Bridging the Gap Between AI Predictions and Chemical Conventions: Template-Guided Reranking for Accurate Reagent Set Suggestion

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reagent recommendation, reaction condition prediction, reranking, reaction templates, long-tailed distribution, statistical inertia, uncertainty-aware prediction, applications in chemistry
TL;DR: TAR re-prioritizes ML-predicted reagent sets via template-derived conditional probabilities, recovering rare but essential candidates marginalized by statistical inertia, improving Top-1 exact match by 4.54–9.73% across three architectures.
Abstract: Reliable reagent set recommendation — selecting a compatible combination of catalysts, solvents, and agents as an integrated unit — is essential for autonomous chemical synthesis and computer-aided synthesis planning. While machine learning approaches have been adopted for this task, they frequently succumb to statistical inertia — disproportionately favoring high-frequency reagents and relegating mechanistically appropriate but rare candidates to lower ranks, ven when present within the model's candidate pool. Moreover, existing approaches often fail to account for the collective compatibility of the full reagent set. To address this, we propose Transformation-Aware Reranking (TAR), a post-hoc framework that identifies the structural transformation underlying each reaction via multi-scale reaction templates. TAR re-prioritizes candidate reagent sets predicted by ML models by evaluating them as integrated units against historically proven combinations within specific mechanistic contexts, anchoring the exploratory power of neural models to chemically consistent protocols. Furthermore, an entropy-based adaptive weighting scheme modulates the influence of these mechanistic priors, balancing specificity for highly specialized transformations against flexibility for versatile ones. Applied to three representative ML architectures, TAR consistently improves Top-1 exact match accuracy by 4.54–9.73\%, effectively recovering mechanistically essential candidates marginalized by global frequency bias and delivering reliable, laboratory-ready synthesis protocols.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 199
Loading