Keywords: Retrosynthesis, Chemical Knowledge
Abstract: Retrosynthesis, the process of predicting reactants from products, remains a critical challenge in computational chemistry and drug discovery. While recent deep learning methods have shown strong performance, they remain overly reliant on reaction datasets, which are limited in availability and quality. Large-scale unlabeled molecular data encode rich structural patterns that can be leveraged to learn transferable chemical knowledge, but remain largely unexplored. In this work, we propose KnowRetro (Knowledge-Guided Retrosynthesis Prediction), a chemically-aware framework that learns chemical knowledge from large-scale unlabeled molecules to enhance the accuracy and diversity of retrosynthesis prediction. Specifically, KnowRetro first builds a hierarchical knowledge graph from millions of unlabeled molecules, which captures transformation-relevant relationships among molecules, substructures, and functional groups. It then employs chemically guided pre-training based on substructure decomposition to encourage the model to capture fundamental reaction patterns, followed by fine-tuning with a KG adapter designed to inject task-relevant knowledge into reactant generation. Extensive experiments demonstrate that KnowRetro achieves high accuracy with improved robustness and diversity in reactant generation.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 15346
Loading