ECloudGen: leveraging electron clouds as a latent variable to scale up structure-based molecular design
Abstract: Structure-based molecule generation represents a notable advancement in artificial intelligence-driven drug design. However, progress in this field is constrained by the scarcity of structural data on protein–ligand complexes. Here we propose a latent variable approach that bridges the gap between ligand-only data and protein–ligand complexes, enabling target-aware generative models to explore a broader chemical space, thereby enhancing the quality of molecular generation. Inspired by quantum molecular simulations, we introduce ECloudGen, a generative model that leverages electron clouds as meaningful latent variables. ECloudGen incorporates techniques such as latent diffusion models, Llama architectures and a contrastive learning task, which organizes the chemical space into a structured and highly interpretable latent representation. Benchmark studies demonstrate that ECloudGen outperforms state-of-the-art methods by generating more potent binders with superior physiochemical properties and by covering a broader chemical space. The incorporation of electron clouds as latent variables not only improves generative performance but also introduces model-level interpretability, as illustrated in our case studies. This study presents ECloudGen, which uses latent diffusion to generate electron clouds from protein pockets and decodes them into molecules. The adopted two-stage training expands the chemical space accessible to generative drug design.
External IDs:dblp:journals/ncs/ZhangJWZYYLZZHZZYHZKPWG25
Loading