α-MOP: Molecule optimization with α-divergence

Tianfan Fu, Cao Xiao, Lucas M. Glass, Jimeng Sun

Published: 2020, Last Modified: 13 Nov 2024BIBM 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automatic molecule optimization aims at generating a new molecule Y with more desirable properties based on an input molecule X. There are two learning strategies for molecule optimization: 1) Maximum Likelihood Estimation (MLE) methods that take a set of molecule pairs (X, Y) as training data where X is the input molecule and Y is the enhanced molecule. However, such molecule pairs are not naturally available in molecule databases and has to be constructed using ad hoc heuristics, which limits the performance of MLE methods. 2) Reinforcement Learning (RL) methods, though bypass the need of molecule pairs as training data, suffer from poor exploration efficiency, especially in the early phase of learning. To address both challenges, we propose $\alpha-$Molecular oPtimization ($\alpha-$MOP), which uses $\alpha-$divergence to unify both MLE and RL objectives automatically. In early phase it focuses more on maximum likelihood objective but gradually shifts more weight onto reinforcement learning objective. Evaluated on multiple datasets, $\alpha-$MOP obtains success rate of 49.91% in QED, 49.32% in DRD2 and 56.43% in LogP, which outperforms both MLE and RL based molecule optimization methods.