Learning Condensed Graph via Differentiable Atom Mapping for Reaction Yield Prediction

Ankit Ghosh; Gargee Kashyap; Sarthak Mittal; Nupur Jain; Raghavan B Sunoj; Abir De

Learning Condensed Graph via Differentiable Atom Mapping for Reaction Yield Prediction

Ankit Ghosh, Gargee Kashyap, Sarthak Mittal, Nupur Jain, Raghavan B Sunoj, Abir De

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Learns yield by approximating atom mapping and then a condensed graph of reaction (CGR), where CGR serves as a surrogate of the transition state.

Abstract: Yield of chemical reactions generally depends on the activation barrier, i.e., the energy difference between the reactant and the transition state. Computing the transition state from the reactant and product graphs requires prior knowledge of the correct node alignment (i.e., atom mapping), which is not available in yield prediction datasets. In this work, we propose YieldNet, a neural yield prediction model, which tackles these challenges. Here, we first approximate the atom mapping between the reactants and products using a differentiable node alignment network. We then use this approximate atom mapping to obtain a noisy realization of the condensed graph of reaction (CGR), which is a supergraph encompassing both the reactants and products. This CGR serves as a surrogate for the transition state graph structure. The CGR embeddings of different steps in a multi-step reaction are then passed into a transformer-guided reaction path encoder. Our experiments show that YieldNet can predict the yield more accurately than the baselines. Furthermore, the model is trained only under the distant supervision of yield values, without requiring fine-grained supervision of atom mapping.

Lay Summary: Predicting the yield of a chemical reaction is a fundamental challenge in chemistry. A key determinant of yield is the transition state, a transient high-energy structure. Existing deep learning models often overlook this critical aspect as real-world datasets typically the any transition state information. To address this, we introduce YIELDNET, a neural network that first approximates atom mapping using a differentiable node alignment module, allowing it to estimate a continuous condensed graph of the reaction as a surrogate transition state. This surrogate graph is then processed by input-differentiable graph neural networks and a transformer-based reaction path encoder to predict yields across multi-step reactions. Crucially, the model is trained solely under the distant supervision of reaction yield values, without requiring ground-truth atom mappings or transition states. This enables end-to-end learning of chemically meaningful inductive bias from data alone. This work bridges a key gap between physical chemistry and machine learning, offering a practical tool for chemists to screen and optimize reactions more effectively. YIELDNET could accelerate discovery in fields ranging from pharmaceuticals to materials science by reducing the need for costly lab experimentation.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Primary Area: Applications->Chemistry, Physics, and Earth Sciences

Keywords: Yield Prediction, GNN, Atom Mapping

Submission Number: 15291

Loading