Context Inference Attacks Without Jailbreaks
Keywords: Machine Learning, Security and Privacy
Abstract: Large Generative Models (LGMs) are increasingly deployed to process sensitive data at inference time, such as healthcare records or financial documents, which are often provided as part of a hidden \emph{context}. Prior work has highlighted privacy risks arising from such hidden context, primarily through \emph{jailbreaking} attacks that induce models to directly disclose sensitive content. However, we show that even when models are robust to jailbreaking and never reveal secrets verbatim, their outputs may still leak exploitable statistical signals about the hidden context. These signals enable an adversary to infer sensitive information from model responses alone, without requiring any explicit prompt manipulation. We introduce a new class of attacks, which we term \emph{context-inference attacks}, where an adversary leverages only benign queries and a weaker surrogate model to recover sensitive information from a stronger target model, thereby bypassing existing detection and filtering mechanisms. Our experiments demonstrate that in standard black-box settings, our attack achieves up to 80% success on vision--language models (VLMs) and scales consistently as the query budget increases, and in grey-box settings it achieves up to 100% success on both large language models (LLMs) and VLMs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
PDF: pdf
Submission Number: 189
Loading