Abstract: With the rapid development of Large Language Models (LLMs), an increasing number of Large Visual-Language Models (LVLMs) have achieved unprecedented performance in response generation. Recent work shows that LVLMs are vulnerable to adversarial attacks. However, many existing methods tend to overfit to the source model by overemphasizing specific features, which compromises their transferability. Other approaches suffer from reduced attack effectiveness due to insufficient differentiation between features. In this paper, we propose a novel transfer-based black-box untargeted attack—Shared Adversarial Feature (SAF) dynamic attack. By exploring the feature extraction patterns of LVLMs, we identify the features shared among various models that are most susceptible to adversarial attacks and disrupt them. Moreover, due to the powerful attention mechanisms of LVLMs, they are still able to extract similar semantics from perturbed images, even when primary features are disrupted. We design a dynamic update strategy to address this challenge. Finally, from the perspective of SAF, we conduct an in-depth analysis of vulnerabilities in the vision encoder and projector within LVLMs and find that attacking the projector exhibits stronger transferability across heterogeneous model architectures. Extensive experiments show that our method exhibits superior attack performance compared to existing methods across different models, datasets, and tasks.
External IDs:dblp:journals/tifs/QianZBYJGWWL26
Loading