Keywords: Large Language Models, Chain of Thought, Privacy
TL;DR: Latent CoT invokes private knowledge without emitting it, and the target entity is still inferable from the final output—undermining content-only guardrails.
Abstract: Latent Chain-of-Thought (Latent CoT) enables reasoning in the continuous internal states of large language models (LLMs), allowing non-linguistic paths beyond token-level explicit CoT. While this creates an implicit privacy risk, models can invoke and reason over private knowledge inside the latent chain, bypass content guardrails, and produce answers that causally depend on that knowledge without reproducing it. We formalize this risk as Private Implicit Knowledge Invocation (PIKI), defined as non-verbatim causal dependence on private knowledge within an implicit chain. We introduce \textit{PIKI-Test}, a dataset with single- and multi-hop privacy questions for auditing Latent CoT LLMs. Using \textit{PIKI-Test}, we audit Latent CoT LLMs and evaluate content guardrails to study how privacy propagates under Latent CoT. We also present \textit{PIKI-Attack} to backtrace latent exposure, and \textit{PIKI-Solve}, a top-down hop decomposition with conservative decoding that reduces exposure and improves auditability. Across multiple models and guardrails, Latent CoT LLMs show about 56\% privacy exposure under multi-hop evaluation, and content guardrails see a 37\% drop in recall on multi-hop privacy QA. These results clarify the privacy risk of latent reasoning in Latent CoT and establish a new audit target for safety-critical LLM deployments. Our code and dataset are available at [this link](https://anonymous.4open.science/r/PIKI-076D).
Privacy note: All privacy-sensitive data are synthetic; no real personally identifiable information (PII) is present.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 11104
Loading