Controlling Chat Style in Language Models via Single-Direction Editing

Controlling Chat Style in Language Models via Single-Direction Editing

ACL ARR 2025 May Submission4464 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Controlling stylistic attributes in large language models (LLMs) remains challenging, with existing approaches relying on either prompt engineering or post-training alignment. We present a lightweight method for style control via vector editing. Our approach shows that stylistic features such as tone and language preference are encoded as linear directions in the model’s activation space. By extracting and applying these style vectors directly to model weights, we achieve precise, training-free style control. The method supports linear style mixing and enhances safety by removing jailbreak acceptance directions. Experiments across diverse models confirm high style adherence, preserved core capabilities, and minimal computational cost.

Paper Type: Short

Research Area: Language Modeling

Research Area Keywords: language models, controllable generation, vector editing, style control, model alignment

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English, French, Italian, Portuguese, German, Chinese, Japanese

Submission Number: 4464

Loading