LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models

Jinho Chang; Jong Chul Ye

LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models

Jinho Chang, Jong Chul Ye

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We present a state-of-the-art text-to-molecule diffusion model by a novel construction of structurally informative molecule latent space.

Abstract: With the emergence of diffusion models as a frontline generative model, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, here we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. By recognizing that the suitable latent space design is the key to the diffusion model performance, we employ a contrastive learning strategy to extract novel feature space from text data that embeds the unique characteristics of the molecule structure. Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark, being one of the first diffusion models that outperforms autoregressive models in textual data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model.

Lay Summary: Designing new molecules with computers could one day revolutionize how we discover medicines or create new materials. One exciting way to do this is by using powerful AI models called diffusion models, which can learn to gradually generate data from pure noise. However, molecules are made up of discrete building blocks, which makes it hard for molecular diffusion models to understand complex instructions such as natural texts. To solve this, we created a new model called LDMol. It learns a better way to represent discrete molecules in an informative "hidden" space using a technique called contrastive learning, which helps the model understand how the structure of a molecule relates to the words we use to describe it. Our experiments show that LDMol performs better than previous methods in tasks where we turn plain text into molecules. Moreover, it can also be utilized in several other useful tasks, such as finding the right text for a given molecule, or even help tweak molecules based on textual instructions.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/jinhojsk515/LDMol

Primary Area: Applications->Chemistry, Physics, and Earth Sciences

Keywords: Diffusion models, Molecule generation, Representation learning

Submission Number: 11710

Loading