Abstract: Drug discovery is a complex process that involves multiple stages and tasks. However, existing molecular generative models can only tackle some of these tasks. We present *Generalist Molecular generative model* (GenMol), a versatile framework that uses only a *single* discrete diffusion model to handle diverse drug discovery scenarios. GenMol generates Sequential Attachment-based Fragment Embedding (SAFE) sequences through non-autoregressive bidirectional parallel decoding, thereby allowing the utilization of a molecular context that does not rely on the specific token ordering while having better sampling efficiency. GenMol uses fragments as basic building blocks for molecules and introduces *fragment remasking*, a strategy that optimizes molecules by regenerating masked fragments, enabling effective exploration of chemical space. We further propose *molecular context guidance* (MCG), a guidance method tailored for masked discrete diffusion of GenMol. GenMol significantly outperforms the previous GPT-based model in *de novo* generation and fragment-constrained generation, and achieves state-of-the-art performance in goal-directed hit generation and lead optimization. These results demonstrate that GenMol can tackle a wide range of drug discovery tasks, providing a unified and versatile approach for molecular design.
Lay Summary: Drug discovery is a complicated process with many steps, but most current AI models for generating molecules can only handle a few of these tasks. This limits their usefulness in real-world drug development.
In this work, we introduce a new framework named GenMol. GenMol is a new, flexible AI model designed to handle a wide range of drug discovery tasks using just one system. GenMol performs parallel generation, meaning it can generate molecules efficiently without depending on the order of tokens of molecular sequences. GenMol is endowed with versatile generation capabilities, including generating molecules by combining small molecular substructures called fragments and improves them by regenerating selected fragments through a process called fragment remasking. GenMol also adopts a scheme called molecular context guidance (MCG) to help it calibrate its own predictions to fully utilize given molecular information.
GenMol outperforms existing methods, including those based on GPT, in several key tasks like designing new molecules and optimizing drug candidates. It offers a powerful, all-in-one tool for faster and more effective drug discovery.
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: drug discovery, molecule generation, discrete diffusion
Submission Number: 8383
Loading