Keywords: retrosynthesis, chemistry, diffusion, categorical diffusion, ensemble
TL;DR: We develop a novel ensemble of diffusion models for chemical retrosynthesis and demonstrate the performance of our models through benchmarking and case studies.
Abstract: Methods for automatic chemical retrosynthesis have found recent success through the application of models traditionally built for natural language processing, primarily through transformer neural networks. These models have demonstrated significant ability to translate between the SMILES encodings of chemical products and reactants, but are constrained as a result of their autoregressive nature. We propose DiffER, an alternative template-free method for retrosynthesis prediction in the form of categorical diffusion, which allows the entire output SMILES sequence to be predicted in unison. We construct an ensemble of diffusion models which achieves state of the art performance for top-1 accuracy and competitive performance for top-3 and top-5 accuracy. We prove that \ours is a strong baseline for a new class of template-free model and is capable of learning a variety of synthetic techniques used in laboratory settings.
Submission Number: 2
Loading