LLM-Augmented Chemical Synthesis and Design Decision Programs

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Retrosynthesis, the process of breaking down a target molecule into simpler precursors through a series of valid reactions, stands at the core of organic chemistry and drug development. Although recent machine learning (ML) research has advanced single-step retrosynthetic modeling and subsequent route searches, these solutions remain restricted by the extensive combinatorial space of possible pathways. Concurrently, large language models (LLMs) have exhibited remarkable chemical knowledge, hinting at their potential to tackle complex decision-making tasks in chemistry. In this work, we explore whether LLMs can successfully navigate the highly constrained, multi-step retrosynthesis planning problem. We introduce an efficient scheme for encoding reaction pathways and present a new route-level search strategy, moving beyond the conventional step-by-step reactant prediction. Through comprehensive evaluations, we show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design.
Lay Summary: Designing a new drug is pointless if chemists can’t figure out how to build it. Retrosynthesis—the backwards puzzle of breaking a complex molecule into buyable pieces is still a challenging task for today's AI. We turned to large language models, asking them to write the whole recipe at once instead of guessing one move at a time. By inventing a compact “sentence” that encodes an entire reaction pathway and coupling it with a smart search routine, our system explores the maze efficiently. On benchmark tests, it delivers shorter, more practical syntheses than leading planners—and can even suggest novel, synthesizable molecules—promising faster, cheaper paths from idea to real-world medicines and materials.
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Large Language models, Retrosynthesis planning, Molecule design
Submission Number: 14541
Loading