Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

Rongzhi Zhang; Liqin Ye; Yuzhao Heng; Xiang Chen; Tong Yu; Lingkai Kong; Sudheer Chava; Chao Zhang

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sudheer Chava, Chao Zhang

15 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Preference Control, Representation Editing, Large Language Models

Abstract: Precise attribute intensity control—generating Large Language Model (LLM) outputs with specific, user-defined attribute intensities—is crucial for AI systems adaptable to diverse user expectations. Current LLM alignment methods, however, typically provide only directional or open-ended guidance, failing to reliably achieve exact attribute intensities. We address this limitation with three key designs: (1) reformulating precise attribute intensity control as a target-reaching problem, rather than simple maximization; (2) training a lightweight value function via temporal-difference learning to predict final attribute intensity scores from partial generations, thereby steering LLM outputs; and (3) employing gradient-based interventions on hidden representations to navigate the model precisely towards specific attribute intensity targets. Our method enables fine-grained, continuous control over attribute intensities, moving beyond simple directional alignment. Experiments on \llama and \PHI confirm our method's ability to steer text generation to user-specified attribute intensities with high accuracy. Finally, we demonstrate efficiency enhancements across three downstream tasks: preference data synthesis, Pareto frontier approximation and optimization, and distillation of aligned behaviors for intervention-free inference. Our code is available on \href{https://anonymous.4open.science/r/pre-control-F482}{https://anonymous.4open.science/r/pre-control-F482}.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 6268

Loading