Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

Michael Sun; Weize Yuan; Gang Liu; Wojciech Matusik; Jie Chen

Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Multi-modal Foundation Models can induce molecular graph languages with high generation quality, domain-specificity and built-in interpretability.

Abstract: Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By exploiting the chemical knowledge of an MMFM, FMG renders molecules as images, describes them as text, and aligns information across modalities using prompt learning. FMG can be used as a drop-in replacement for the prior grammar learning approaches in molecular generation and property prediction. We show that FMG not only excels in synthesizability, diversity, and data efficiency but also offers built-in chemical interpretability for automated molecular discovery workflows. Code is available at https://github.com/shiningsunnyday/induction.

Lay Summary: We show your browser GPT-4o can perform chemical reasoning tasks at the level of an expert with a PhD in Chemistry. We apply GPT-4o as the decision-maker within an algorithm which learns the rules of a molecular language by breaking down one example molecule at a time. The algorithm hierarchically breaks down a molecule in a step-by-step manner, merging substructures into progressively larger ones and arranging them together. Each step is done by asking GPT-4o to select one out of a grid of image cells, representing a fork in the road of the algorithm's execution. In each image cell is a molecule with 1 or 2 of its substructures highlighted for maximum visual effect. We then elicit multi-modal reasoning by chain-of-thought prompting GPT-4o to describe what structures it sees, how different substructures interact, before deciding which option is most pivotal to the overall design of the molecule. We chain the explanations from each response together to form a design story summarizing the whole execution. We repeat the execution multiple times, obtaining possibly different results each time. To resolve this, we ask GPT-4o to compare which design story is more comprehensive, and obtain final rankings by hosting a tournament where where each ``player" is a different possible breakdown of the molecule. The higher ranked breakdowns are then pooled together to induce the rules of the language, which is a context-free graph grammar that can be sampled to generate diverse, novel molecules. We demonstrate FMG outperforms existing state-of-the-art methods on popular molecular generation benchmarks in data-expensive settings with tens to hundreds of examples. We evaluate FMG’s step-by-step reasoning via comprehensive expert provided case studies and quantitative analyses. FMG bridges the gap between elusive expert domain knowledge and emergent capabilities of MMFMs using an interpretable workflow backed by technical rigor, semantic flexibility, and expert validation.

Link To Code: https://github.com/shiningsunnyday/induction

Primary Area: Applications->Chemistry, Physics, and Earth Sciences

Keywords: molecular grammar, interpretability, foundation model, multi-modal

Submission Number: 8763

Loading