TL;DR: We introduce Atomic GFlowNets, a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively.
Abstract: Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks. The code is accessible at https://github.com/diamondspark/AGFN.
Lay Summary: Chemists urgently need faster and more effective ways to design new drug molecules. However, many existing AI tools rely on assembling compounds from pre-made molecular fragments, which limits their ability to explore the full range of chemical possibilities.
We present Atomic GFlowNet, an AI system that builds molecules atom by atom, much like constructing a structure with Lego bricks. This fine-grained approach allows it to access a much wider and more diverse chemical space.
To begin, we train the model on millions of existing drug-like molecules, guiding it with simple, low-cost objectives such as synthetic accessibility and drug-likeness. Once this foundation is established, we can quickly adapt the same model using a small amount of additional data to pursue more challenging goals, like binding to a specific disease-related protein.
In our evaluations, Atomic GFlowNet generated significantly more diverse and promising molecules than leading methods. It was also able to improve known drug candidates after just a single day of computation. This work offers a faster and more comprehensive path to discovering future medicines and other high-value chemical compounds.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/diamondspark/AGFN
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Generative Flow Networks, Foundation model, Drug Discovery
Submission Number: 14523
Loading