Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

Published: 25 Sept 2025, Last Modified: 25 Sept 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid discovery of new chemical compounds is essential for advancing global health and developing treatments. While generative models show promise in creating novel molecules, challenges remain in ensuring the real-world applicability of these molecules and finding such molecules efficiently. To address this challenge, we introduce Conditional Latent Space Molecular Scaffold Optimization (CLaSMO), which integrates a Conditional Variational Autoencoder (CVAE) with Latent Space Bayesian Optimization (LSBO) to strategically modify molecules while preserving similarity to the original input, effectively framing the task as constrained optimization. Our LSBO setting improves the sample-efficiency of the molecular optimization, and our modification approach helps us to obtain molecules with higher chances of real-world applicability. CLaSMO explores substructures of molecules in a sample-efficient manner by performing BO in the latent space of a CVAE conditioned on the atomic environment of the molecule to be optimized. Our extensive evaluations across diverse optimization tasks—including rediscovery, docking score, and multi‑property optimization—show that CLaSMO efficiently enhances target properties, delivers remarkable sample-efficiency crucial for resource‑limited applications while considering molecular similarity constraints, achieves state of the art performance, and maintains practical synthetic accessibility. We also provide an open-source web application that enables chemical experts to apply CLaSMO in a Human-in-the-Loop setting.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: A link to a GitHub repository is added to the main manuscript.
Code: https://github.com/onurboyar/CLASMO-TMLR
Supplementary Material: zip
Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1
Submission Number: 4765
Loading