ChemPRM: Improving Retrosynthesis by Structured Intermediate Process Reward

Lingfeng Zhang; XIANBIN YE; Lihang Liu; Zhangxiaonan; Xiaomin Fang; Luca Rossi

ChemPRM: Improving Retrosynthesis by Structured Intermediate Process Reward

Lingfeng Zhang, XIANBIN YE, Lihang Liu, Zhangxiaonan, Xiaomin Fang, Luca Rossi

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Abstract: Retrosynthesis prediction plays a central role in computer-aided drug discovery, as it requires recall of feasible precursor molecules for a given target compound. Despite substantial progress driven by deep learning approaches, existing models often perform direct product-to-reactant mapping without explicitly encoding chemical reasoning, which limits interpretability and can result in chemically implausible predictions. In this work, we propose ChemPRM, a structured framework for single-step retrosynthesis that decomposes prediction into a sequence of chemically interpretable intermediate states. The framework introduces a structured intermediate process reward and applies supervised fine-tuning on explicit intermediate annotations to guide the model toward chemically valid reasoning trajectories. Experiments on the USPTO-50k benchmark demonstrate that ChemPRM achieves competitive performance relative to state-of-the-art methods, while substantially improving interpretability and robustness.

Keywords: Chemistry, Large Language Model

Submission Number: 84

Loading