Concept-Based Steering of LLMs for Conditional Molecular Generation

Jeremy Qin; Rushil Gupta; Boris Knyazev; Yan Zhang; Glen Berseth; Bang Liu

Concept-Based Steering of LLMs for Conditional Molecular Generation

Jeremy Qin, Rushil Gupta, Boris Knyazev, Yan Zhang, Glen Berseth, Bang Liu

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: activation engineering, large language models, concept bottleneck, conditional molecular generation

TL;DR: We create a new method that used activation engineering which is an efficient, flexible and scalable approach for improved conditional molecular generation.

Abstract: Generating valid, unique, and high-fidelity molecules while precisely controlling for multiple properties simultaneously remains challenging. While prior works with LLMs have achieved success by fine-tuning language models on novel molecular corpora, they remain limited in scope. Real-world applications require generating molecules from unseen property distributions, a task that remains challenging for fine-tuned models. To this end, we present Concept-based Activation STeering (CAST), the first approach to apply activation steering to directly edit a model's internal representation for conditional molecular generation. CAST offers a lightweight, flexible alternative to fine-tuning by computing property-conditioned steering vectors via a concept network that does not require retraining the LLM. Through extensive experiments on datasets such as Therapeutics Data Commons, we show that CAST consistently outperforms existing methods on both in-distribution and out-of-distribution conditional generation tasks. We also conduct comprehensive ablation studies to highlight the extent of control our concept-guided steering provides on the molecules generated by the LLM.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 10197

Loading