Chemist-aligned retrosynthesis by ensembling diverse inductive bias models

Krzysztof Maziarz; Guoqing Liu; Austin Tripp; Junren Li; Piotr Gaiński; Marwin Segler

Chemist-aligned retrosynthesis by ensembling diverse inductive bias models

Krzysztof Maziarz, Guoqing Liu, Austin Tripp, Junren Li, Piotr Gaiński, Marwin Segler

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: chemistry, retrosynthesis, reaction prediction, ensemble, transformer, graph neural network

TL;DR: We propose a retrosynthesis model ensembling two components with diverse biases, and show it performs very well

Abstract: Chemical synthesis remains a critical bottleneck in the discovery and manufacture of functional small molecules. AI-based synthesis planning models could be a potential remedy to find effective syntheses, and have made progress in recent years. However, they still struggle with less frequent, yet critical reactions for synthetic strategy, as well as hallucinated, incorrect predictions. This hampers multi-step search algorithms that rely on models, and leads to misalignment with chemists' expectations. Here we propose RetroChimera: a frontier retrosynthesis model, built upon two newly developed components with complementary inductive biases, which we fuse together using a new framework for integrating predictions from multiple sources via a learning-based ensembling strategy. Through experiments across several orders of magnitude in data scale and splitting strategy, we show RetroChimera outperforms all major models by a large margin, demonstrating robustness outside the training data, as well as for the first time the ability to learn from even a very small number of examples per reaction class. Moreover, industrial organic chemists prefer predictions from RetroChimera over the reactions it was trained on in terms of quality, revealing high levels of alignment. With the new dimensions that our model unlocks, we anticipate further acceleration towards full lab-in-the-loop automation of synthesis planning and execution.

Submission Number: 489

Loading