Chain-of-Sanitized-Thoughts: Reducing PII Leakage in Chain-of-Thought Reasoning

Arghyadeep Das; Sai Sreenivas Chintha; Sharvi Endait; Rishiraj Girmal; Kinjal Pandey

Chain-of-Sanitized-Thoughts: Reducing PII Leakage in Chain-of-Thought Reasoning

Arghyadeep Das, Sai Sreenivas Chintha, Sharvi Endait, Rishiraj Girmal, Kinjal Pandey

Published: 23 May 2026, Last Modified: 29 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0

Keywords: chain-of-thought reasoning, large language models, reasoning models, privacy, PII leakage, model safety, privacy-preserving NLP, model evaluation, prompt engineering, supervised fine-tuning, agentic systems, retrieval-augmented generation

TL;DR: We systematically show that chain-of-thought reasoning acts as a hidden channel for PII leakage in language models, and demonstrate that this leakage can be measured and significantly reduced, often without sacrificing task performance.

Abstract: Large Reasoning Models (LRMs) are increasingly deployed as components of LLM agents that perform multi-step interactions and maintain an intermediate state. While its chain-of-thought (CoT) reasoning improves performance, it introduces a privacy risk: intermediate reasoning traces can expose personally identifiable information (PII) even when final answers are sanitized. In agent settings, these traces may be logged or propagated across components, creating a persistent attack surface. We study whether models can be induced to reason privately, rather than relying on post-hoc redaction. We introduce Chain-of-Sanitized-Thoughts, a privacy-first intervention that teaches LRMs to “think privately”. To do so, we also introduce PII-CoT-Bench, a supervised dataset and evaluation benchmark for privacy-aware reasoning under realistic and adversarial leakage scenarios. Our results reveal a capability-dependent pattern: stronger instruction-following models can suppress leakage through prompting alone, while weaker models require parameter updates to reliably avoid disclosure. Despite substantial reductions in leakage, utility remains largely preserved, indicating that privacy-preserving reasoning need not degrade task performance. These findings suggest that privacy-preserving reasoning can be improved at the model level, providing a practical path toward safer deployment of LLM agents.

Track: Regular Paper (9 pages)

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 86

Loading