Enhancing Large Language Models for Constraint-Driven Molecular Generation and Beyond

Enhancing Large Language Models for Constraint-Driven Molecular Generation and Beyond

TMLR Paper8790 Authors

06 May 2026 (modified: 08 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Most de novo molecule generators attempt to satisfy hard chemical constraints in a single forward pass, offering little guidance when outputs fall short. We introduce Code-Driven Molecular Synthesis (CDMS) -- an iterative, model-agnostic framework that embeds a formal self-improving feedback loop into large language models (LLMs). At the start of each task, the LLM uses the chemist’s request as input to generate a snippet of executable code, referred to as an \emph{inspector}, which formalizes the evaluation logic for guiding molecular refinement. This inspector remains fixed throughout the refinement process and is executed on every candidate molecule at each iteration. It produces natural-language critiques describing how to improve the molecule to better meet user-defined constraints (e.g., “add a para-hydroxyl group”). These \emph{Programmatic Feedback Gradients} are appended to subsequent prompts, guiding the LLM toward progressively refined outputs until all structural and functional requirements are satisfied. CDMS achieves state-of-the-art success rates in constraint satisfaction using only a few feedback iterations and without any model retraining. To encourage further research, we release a benchmark dataset curated for code-generated, feedback-driven molecular design \footnote{\url{https://anonymous.4open.science/r/CDMS-C08D/}}.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Sungwoong_Kim2

Submission Number: 8790

Loading