Scalable and Cost-Efficient de Novo Template-Based Molecular Generation

Published: 06 Mar 2025, Last Modified: 26 Apr 2025GEMEveryoneRevisionsBibTeXCC BY 4.0
Track: Machine learning: computational method and/or computational results
Nature Biotechnology: No
Keywords: GFlownets, molecular generation, synthesis
Abstract:

Recent advances in reaction-based molecular generation hold great promise for drug design. Composing a molecule from a predefined set of reaction templates and building blocks keeps the generative modeling in line with what can be synthesized in a real-world wet lab. In this paper, we tackle three important challenges of template-based GFlowNets: 1) reducing the synthesis cost, 2) navigating in a large set of building blocks, and 3) exploiting a small set of building blocks. We propose Cost Guidance for a backward policy that uses an auxiliary machine-learning model to approximate the synthesis cost. Our approach limits the costs of proposed molecules, while drastically improving their diversity and quality in large-scale settings. Moreover, we design a Dynamic Library mechanism that allows the generation of full synthesis trees, boosting the results in small-scale settings. The resulting generative model establishes state-of-the-art results in template-based molecular generation in a benchmark concerning synthesis cost and diversity of high-rewarded molecules.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: Piotr Gaiński
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 77
Loading