CFOM: Lead Optimization For Drug Discovery With Limited Data

Natan Kaminsky, Uriel Singer, Kira Radinsky

Published: 01 Jan 2023, Last Modified: 27 Jul 2024CIKM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Drug development is a long and costly process consisting of several stages that can take many years to complete. One of the early stage's goals is to optimize a novel chemical compound to be active against a target protein associated with the disease. Often machine learning techniques are used to improve the procedure of discovering and optimizing potential drug candidates. The goal of molecule optimization is, given an input molecule, to produce a new molecule that is chemically similar to the input molecule but with an improved property. We present a novel algorithm that during optimization divides a molecule into two disjoint substructures that we call: the molecule chains and the molecule core. Our approach is inspired by expert design of chemical compounds that employ a fundamental molecular template and add to it chemical functional groups to generate compounds with desired properties. We train a model to generate the molecule chains with the desired properties for optimization, which are then attached to the molecule core to construct a novel molecule with high similarity to the input molecule. This is achieved by selective masking of pairs of input molecules' chains and cores during training. Additionally, we demonstrate the extension of this approach to data-scarce tasks, like targeting a drug to a novel protein. We first evaluate our method on standard molecule optimization tasks such as inhibition against glycogen synthase kinase-3 beta (GSK3β). We then empirically compared the model performance with the state-of-the-art algorithms over 21 novel proteins and show superior performance.