Controlling Chat Style in Language Models via Single-Direction Editing

Controlling Chat Style in Language Models via Single-Direction Editing

ACL ARR 2026 January Submission7455 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Style Control, Interpretability

Abstract: Controlling stylistic attributes in large language models (LLMs) remains challenging, with existing approaches relying on either prompt engineering or post-training alignment. This paper investigates this challenge through the lens of representation engineering, testing the hypothesis that distinct stylistic attributes—from emotional tone to linguistic structure—are encoded as linear directions in the model's activation space. We provide strong empirical evidence for this hypothesis across a wide range of styles and, based on this finding, present a lightweight, training-free method for precise style control. Our approach supports linear style composition, enhances safety by ablating undesirable behaviors, and, as confirmed by experiments on over a dozen models, achieves high style adherence while preserving core capabilities at minimal computational cost.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Large Language Models, Style Control, Interpretability

Contribution Types: Model analysis & interpretability

Languages Studied: English, French, Italian, Portuguese, German, Chinese, Japanese

Submission Number: 7455

Loading