Privacy Risks of Intermediate Representations: Attribute Inference in Distributed LLM Inference

ACL ARR 2026 January Submission4624 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: intermediate representations, attribute inference, privacy leakage, distributed inference, large language models, representation geometry
Abstract: Distributed LLM inference avoids sending raw inputs by transmitting intermediate hidden states, a practice widely assumed to preserve privacy. We challenge this assumption and demonstrate that intermediate representations alone are sufficient to leak sensitive user attributes. This setting poses a fundamental obstacle for existing attribute inference attacks, which typically rely on auxiliary embedding-attribute pairs. To characterize this previously underexplored privacy risk, we reformulate attribute inference as zero-shot semantic similarity matching directly in the intermediate representation space, and introduce a purely intermediate-representation-based attribute inference attack, termed IR-AIA. To address two structural challenges that hinder attribute inference from intermediate representations, we propose SG-APCR to address layer-dependent anisotropy in intermediate embeddings and a sliding-window similarity matching strategy to handle subword-level semantic fragmentation. Experiments across three LLMs and three real-world datasets show that sensitive attributes can be reliably inferred using only intermediate representations, achieving Top-1 accuracy of up to 0.997 on CMS, 0.980 on Skytrax, and 0.986 on ECHR. These results reveal that intermediate states commonly considered safe to share can expose sensitive personal attributes on their own.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: security and privacy, red teaming, safety and alignment, robustness
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 4624
Loading