Synthesis-constrained discrete diffusion for ionizable lipid generation
Keywords: discrete diffusion, molecular generation, ionizable lipids, lipid nanoparticles, mRNA delivery, scaffold conditioning, drug discovery, generative models, graph neural networks, classifier-free guidance
TL;DR: Synthesis-constrained discrete diffusion generates novel ionizable lipids
Abstract: Ionizable lipids are the critical component of lipid nanoparticles for mRNA delivery, yet their discovery remains bottlenecked by library enumeration. Existing machine learning approaches can rank pre-defined candidates but cannot generate novel structures. We introduce synthesis-constrained diffusion, the first deep generative model for ionizable lipids, embedding combinatorial chemistry constraints directly into scaffold-conditioned generation. Our proof of concept enforces Ugi scaffold integrity by construction: core bonds formed by the reaction mechanism are fixed throughout diffusion, while region-aware noise distributions capture the distinct chemistry of ionizable heads versus lipophilic tails. A three-stage curriculum (pretraining on drug-like molecules, domain adaptation on virtual lipids, and property-conditioned fine-tuning) enables learning from limited experimental data. Demonstrating this framework on Ugi-based lipid synthesis, 99% of generated samples are chemically valid with intact scaffolds and 62% are novel. The top candidate achieves 2× higher predicted transfection than the training mean (in silico).
Submission Number: 111
Loading