Controllable Molecule Generation via Sparse Representation Editing: An Interpretability-Driven Perspective

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Controllable Molecule Generation, Interpretability
TL;DR: We introduce SpaRE, an interpretability-driven approach for LLM-based controllable molecule generation, which produces chemically desirable molecules under complex constraints.
Abstract: Controllable molecule generation is crucial for diverse scientific applications, such as drug discovery and materials design. While large language models (LLMs) show great promise, their dense and entangled representations impede precise control over the generation of molecules with bespoke substructures or properties. To address this, we propose Sparse Representation Editing (SpaRE), an interpretability-driven framework for fine-grained and precise control in LLM-based molecule generation. The crux of SpaRE is to disentangle dense representations into various sparsely activated latent patterns that correspond to chemically meaningful concepts. Building on this, SpaRE enables direct manipulation of LLM representations associated with these concepts to achieve (1) local control, by generating target atoms and functional groups at specified positions; and (2) global control, by customizing the overall structural and physicochemical properties within defined ranges. In this way, our framework advances interpretability from post-hoc analysis to actionable generative control. Experiments demonstrate that SpaRE is capable of generating chemically desirable molecules under complex constraints in real-world scenarios, while providing causal insights for quantitative structure–property analysis. The code and demo are available at https://github.com/SpaRE-paper/SpaRE.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 13124
Loading