Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

TMLR Paper3662 Authors

11 Nov 2024 (modified: 31 Jan 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid discovery of new chemical compounds is essential for advancing global health and developing treatments. While generative models show promise in creating novel molecules, challenges remain in ensuring the real-world applicability of these molecules and finding such molecules efficiently. To address this, we introduce Conditional Latent Space Molecular Scaffold Optimization (CLaSMO), which combines a Conditional Variational Autoencoder (CVAE) with Latent Space Bayesian Optimization (LSBO) to modify molecules strategically while maintaining similarity to the original input. Our LSBO setting improves the sample-efficiency of our optimization, and our modification approach helps us to obtain molecules with higher chances of real-world applicability. CLaSMO explores substructures of molecules in a sample-efficient manner by performing BO in the latent space of a CVAE conditioned on the atomic environment of the molecule to be optimized. Our experiments across 22 diverse optimization tasks reveal that CLaSMO efficiently enhances target properties with minimal substructure modifications, delivers notable sample-efficiency—an essential factor in resource-constrained real-world scenarios—and achieves state-of-the-art results while utilizing a smaller model and dataset compared to existing methods. We also provide an open-source web application that enables chemical experts to apply CLaSMO in a Human-in-the-Loop setting.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=TWXNivAR6A
Changes Since Last Submission: The paper has been revised in response to the reviewers' comments. The focus of our proposal is now more clearly articulated, new experiments incorporating an additional benchmark methodology (Graph-GA) have been conducted, and new plots have been added to provide a broader perspective on the results. Additionally, detailed information on the hyperparameters of baseline methodologies has been included, and the citation formatting has been corrected.
Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1
Submission Number: 3662
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview