Keywords: Vision Language Models, weight replacement attack, adversarial attack
Abstract: Vision language models (VLMs) excel at multimodal tasks such as image captioning and visual question answering, yet they remain vulnerable to input manipulation attacks (e.g., jailbreak and adversarial attacks). However, the vulnerability of VLMs to adversarial weight perturbation remains largely underexplored. Our initial investigation reveals that VLMs remain extremely resilient to conventional weight corruption attacks leveraging memory fault injections (e.g., bit-flip attacks). As a consequence, we propose the first successful adversarial weight perturbation attack against VLMs (VLM-PTA). Our attack leverages page table attack (PTA), a well-established memory fault injection technique. In the main memory, each weight block consists of a group of weights located at a specific address. Consequently, a bit-flip in the page frame number replaces a victim weight block of a VLM with another substitute weight block. However, the algorithmic challenge in creating a formal attack is that the random injection of weight replacement into the model fails to cause any detrimental impact on the model’s performance. Therefore, we theoretically analyze the bottleneck of the PTA-based fault injection mechanism and propose a novel estimation method (Block-Flip) to maximize attack effectiveness and efficiency. VLM-PTA is the most successful weight perturbation attack against VLMs optimized to achieve adversarial objectives with an extremely low overhead, bypassing existing defenses.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 6017
Loading