Keywords: retrosynthesis, loss function, template based, differential top-k
TL;DR: Using a new family of loss functions, we improve top-k accuracies for organic retrosynthesis in a baseline model.
Abstract: Retrosynthesis is one of the core tasks in the organic molecule design cycle, yet it is still a computational challenge to produce suitable sets of precursors for a desired product. Commonly used template-based approaches reduce the problem to a multi-class classification task for single steps. However, reactions in available datasets are noisy and incomplete, making usual training methods problematic. In this work, considering that multiple disconnections are possible for a product, we propose training models using differential top-k losses. We show that using these loss functions yields improvements in every top-N metric, with little overhead relative to cross-entropy. The use of more powerful models, more diverse and complete datasets, and other methodologies, is expected to yield significant improvements on this task when combined with the training approach presented here.
Paper Track: Papers
Submission Category: Automated Chemical Synthesis
Supplementary Material: pdf