Keywords: Model inversion Attacks, VLM
TL;DR: We propose a suite of novel token-based and sequence-based model inversion strategies for VLMs
Abstract: Model inversion (MI) attacks pose significant privacy risks by reconstructing private training data from trained neural networks. While prior works have focused on conventional unimodal DNNs, the vulnerability of vision-language models (VLMs) remains underexplored. In this paper, we conduct the first study to understand VLMs' vulnerability in leaking private visual training data. To tailored for VLMs' token-based generative nature, we introduce four novel token-based and sequence-based model inversion strategies. Particularly,
we propose
**Sequence-based Model Inversion with Adaptive Token Weighting (SMI-AW)**, based on our insight that not all tokens are equally informative for inversion. By dynamically reweighting token-level feedback according to each token’s informativeness for inversion, SMI-AW achieves consistent improvement in reconstruction quality.
Through extensive experiments and user study on three state-of-the-art VLMs and multiple datasets, we demonstrate, for the first time, that VLMs are susceptible to training data leakage. The experiments show that our proposed sequence-based methods, particularly SMI-AW combined with a logit-maximization loss based on vocabulary representation, can achieve competitive reconstruction and outperform token-based methods in attack accuracy and visual similarity. Importantly, human evaluation of the reconstructed images yields an attack accuracy of 75.31%, underscoring the severity of model inversion threats in VLMs. Notably,
we also demonstrate inversion attacks on the publicly released VLMs. Our study reveals the privacy vulnerability of VLMs as they become increasingly popular across many applications such as healthcare and finance.
**Our code, pretrained models, and reconstructed images are available in OpenReview’s discussion forum.**
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6349
Loading