everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Recently, domain-specific languages (DSLs) for molecular generation have shown advantages in data-efficiency and interpretability. However, constructing such a DSL requires human expertise or significant computational costs. Multi-modal foundation models (MMFMs) have shown remarkable in-context abilities for tasks across vision and text domains, but not graphs. We explore an unconventional solution: we render the molecule as an image, describe it using text, and cast the DSL construction into an equivalent problem of constructing a tree decomposition for the molecular graph. The MMFM performs a chain of discrete decisions to replace traditional heuristics used within the execution of the decomposition, enabling the smooth integration of its prior knowledge without overstepping the limits of the soundness of the algorithm. Furthermore, we collect MMFM’s reasoning for each decision into a design story, have non-expert agents evaluate stories for correctness and persuasiveness, and close the feedback loop to improve the DSL. Our method, Foundation Molecular Grammar (FMG), demonstrates significant advantages in synthesizability, diversity, and data-efficiency on molecule generation benchmarks. Moreover, its compelling chemical interpretability offers built-in transparency over the molecular discovery workflow, paving the way for additional feedback and oversight.