Context Inference Attacks Without Jailbreaks

Prince Jha; Samuele Poppi; Nils Lukas

Context Inference Attacks Without Jailbreaks

Prince Jha, Samuele Poppi, Nils Lukas

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0

Keywords: Machine Learning, Security and Privacy

Abstract: Large Generative Models (LGMs) are increasingly deployed to process sensitive data at inference time, such as healthcare records or financial documents, which are often provided as part of a hidden \emph{context}. Prior work has highlighted privacy risks arising from such hidden context, primarily through \emph{jailbreaking} attacks that induce models to directly disclose sensitive content. However, we show that even when models are robust to jailbreaking and never reveal secrets verbatim, their outputs may still leak exploitable statistical signals about the hidden context. These signals enable an adversary to infer sensitive information from model responses alone, without requiring any explicit prompt manipulation. We introduce a new class of attacks, which we term \emph{context-inference attacks}, where an adversary leverages only benign queries and a weaker surrogate model to recover sensitive information from a stronger target model, thereby bypassing existing detection and filtering mechanisms. Our experiments demonstrate that in standard black-box settings, our attack achieves up to 80% success on vision--language models (VLMs) and scales consistently as the query budget increases, and in grey-box settings it achieves up to 100% success on both large language models (LLMs) and VLMs.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

PDF: pdf

Submission Number: 189

Loading