Controllable Reasoning Models Are Private Thinkers

Controllable Reasoning Models Are Private Thinkers

ACL ARR 2026 January Submission449 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning models, instruction following, privacy

Abstract: Reasoning traces produced by reasoning models are difficult to control, which can lead to the unintended disclosure of private information even when models are explicitly instructed to avoid it. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. To demonstrate this idea, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following benchmarks and two privacy benchmarks. Our method yields substantial improvements, achieving gains of up to 25.5 points in instruction-following performance and up to 50.31 percentage points on privacy benchmarks. These improvements, however, can come at the cost of task utility, due to the trade-off between reasoning performance and instruction-following abilities. Overall, our results show that improving instruction-following behavior in reasoning models can significantly enhance privacy, suggesting a promising direction for the development of future privacy-aware agents.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: chain-of-thought, safety and alignment, security and privacy

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 449

Loading