Keywords: generative models, latent variable models, molecule property optimization, molecule generation
TL;DR: MolMIM is a latent variable model that learns an informative and clustered latent space that is used with a naive black-box optimization method to achieve state-of-the-art performance in small molecule generation and property optimization.
Abstract: We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints.
Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space.
MolMIM is trained with Mutual Information Machine (MIM) learning and provides a fixed-size representation of variable-length SMILES strings.
Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the MIM training procedure which promotes a dense latent space and allows the model to sample valid molecules from random perturbations of latent codes.
We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty.
We then utilize CMA-ES, a naive black-box, and gradient-free search algorithm, over MolMIM's latent space for the task of property-guided molecule optimization.
We achieve state-of-the-art results in several constrained single-property optimization tasks and show competitive results in the challenging task of multi-objective optimization.
We attribute the strong results to the structure of MolMIM's learned representation which promotes the clustering of similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method.
We also demonstrate MolMIM to be favorable in a compute-limited regime.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/improving-small-molecule-generation-using/code)
1 Reply
Loading