Keywords: Diffusion models, Computational drug discovery, Ligand design, protein-conditioned diffusion framework
TL;DR: We present SIDgen (Structure Integrated Diffusion Generator), a protein-conditioned diffusion framework that integrates masked SMILES generation with lightweight folding-derived features for pocket awareness.
Abstract: Designing ligands that are both chemically valid and structurally compatible with protein binding pockets is a key bottleneck in computational drug discovery. Existing approaches either ignore the structural context or rely on expensive, memory-intensive encoding, which limits throughput and scalability. We present SIDgen (Structure Integrated Diffusion Generator), a protein-conditioned diffusion framework that integrates masked SMILES generation with lightweight folding-derived features for pocket awareness. To balance expressivity with efficiency, SIDGen supports two conditioning pathways: a streamlined mode that pools coarse structural signals from protein embeddings and a full mode that injects localized pairwise biases for stronger coupling. A coarse-stride folding mechanism with nearest-neighbor up-sampling alleviates the quadratic memory costs of pair tensors, enabling training on realistic sequence lengths. Learning stability is maintained through in-loop chemical validity checks and an invalidity penalty, while large-scale training efficiency is restored via selective compilation, dataloader tuning, and gradient accumulation. In automated benchmarks, SIDGen generates ligands with high validity, uniqueness, and novelty, while achieving strong enrichment in docking-based evaluations (EF, BEDROC) and competitive pose quality (RMSD, score-based ROC/PR). Robust receptor preparation and ligand-derived docking boxes further ensure reliable assessment across diverse protein families. These results demonstrate that SIDGen can deliver scalable, pocket-aware molecular design, providing a practical route to conditional generation for high-throughput drug discovery.
Primary Area: generative models
Submission Number: 20360
Loading