Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: jailbreak attack, large visual language models, visual thoughts
TL;DR: We propose VoTA, a novel attack framework that exploits the tension between logical reasoning and safety objectives in VLMs by generating chains of images with risky visual thoughts, achieving significantly higher success rates than existing methods.
Abstract: As Visual Language Models (VLMs) continue to evolve, they have demonstrated increasingly sophisticated logical reasoning capabilities and multimodal thought generation, opening doors to widespread applications. However, this advancement raises serious concerns about content security, particularly when these models process complex multimodal inputs requiring intricate reasoning. When faced with these safety challenges, the critical competition between logical reasoning and safety objectives of VLMs is often overlooked in previous works. In this paper, we introduce Visualization-of-Thought Attack (\textbf{VoTA}), a novel and automated attack framework that strategically constructs chains of images with risky visual thoughts to challenge victim models. Our attack provokes the inherent conflict between the model's logical processing and safety protocols, ultimately leading to the generation of unsafe content. Through comprehensive experiments, VoTA achieves remarkable effectiveness, improving the average attack success rate (ASR) by 26.71\% (from 63.70\% to 90.41\%) on 9 open-source and 6 commercial VLMs, compared to the state-of-the-art methods. These results expose a critical vulnerability: current VLMs struggle to maintain safety guarantees when processing insecure multimodal visualization-of-thought inputs, highlighting the urgency and necessity of enhancing safety alignment. Our code and dataset are available at https://github.com/Hongqiong12/VoTA. Content Warning: This paper contains harmful contents that may be offensive.
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 28156
Loading