Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

ACL ARR 2024 December Submission436 Authors

13 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the significant advancement of Large Vision-Language Models (VLMs), concerns about their ‌potential misuse and abuse have grown rapidly. Previous studies have highlighted VLMs' vulnerability to jailbreak attacks, where carefully crafted inputs can lead the model to produce content that violates ethical and legal standards. However, existing jailbreak methods struggle against state-of-the-art VLMs like GPT-4o, due to the over-exposure of harmful content and lack of stealthy malicious guidance. In this work, we propose a novel jailbreak attack framework: Multi-Modal Linkage (MML) Attack. Drawing inspiration from cryptography, MML utilizes an encryption-decryption process across text and image modalities to mitigate over-exposure of malicious information. To align the model’s output with malicious intent covertly, MML employs a technique called ``evil alignment'', framing the attack within a video game production scenario. Comprehensive experiments demonstrate MML's effectiveness. Specifically, MML jailbreaks GPT-4o with attack success rates of 97.80\% on SafeBench, 98.81\% on MM-SafeBench and 99.07\% on HADES-Dataset. We will open source code and data in the public version of this manuscript.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: security and privacy

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 436

Loading