Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

ICLR 2026 Conference Submission20767 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Molecular Reasoning, Retrosynthesis, Cheminformatics

TL;DR: A novel framework that uses LLMs to reason over molecular structures, enabling zero-shot, chemically plausible retrosynthesis without requiring task-specific fine-tuning or labeled data.

Abstract: Applications of machine learning in chemistry are often limited by the scarcity and expense of labeled data, restricting traditional supervised methods. In this work, we introduce a framework for molecular reasoning using general-purpose Large Language Models (LLMs) that operates without requiring labeled training data. Our method anchors chain-of-thought reasoning to the molecular structure by using unique atomic identifiers. First, the LLM performs a one-shot task to identify relevant fragments and their associated chemical labels or transformation classes. In an optional second step, this position-aware information is used in a few-shot task with provided class examples to predict the chemical transformation. We apply our framework to single-step retrosynthesis, a task where LLMs have previously underperformed. Across academic benchmarks and expert-validated drug discovery molecules, our work enables LLMs to achieve high success rates in identifying chemically plausible reaction sites ($\geq$90%), named reaction classes ($\geq$40%), and final reactants ($\geq$74%). Beyond solving complex chemical tasks, our work also provides a method to generate theoretically grounded synthetic datasets by mapping chemical knowledge onto the molecular structure and thereby addressing data scarcity.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 20767

Loading