TISR: Breaking Vision–Language Models via Text–Image Segmentation and Reassembly

TISR: Breaking Vision–Language Models via Text–Image Segmentation and Reassembly

ICLR 2026 Conference Submission16747 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LVLM Attack; jailbreak; reasoning chain; prompt Segmentation

Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal understanding capabilities, yet their inherent security vulnerabilities remain a pressing concern. Prior studies have indicated that enhancing chain-of-thought reasoning can inadvertently facilitate jailbreaks; however, existing attempts often rely on image masking, which only modestly extends reasoning depth and performs poorly against advanced commercial closed-source models. To bridge this gap, we propose an effective jailbreak method, Text–Image Segmentation and Reassembly (TISR). TISR disperses harmful semantic fragments across the visual modality while employing text-based role-playing to guide their reassembly. This strategic organization conceals surface-level toxicity while preserving malicious intent, thereby extending LVLM reasoning chains and enabling successful jailbreaks. Our experiments show that LVLMs, leveraging their visual understanding and reasoning abilities, can reconstruct complete harmful semantics while failing to uphold safety constraints. Extensive evaluations across massive LVLMs demonstrate that the proposed TISR achieves state-of-the-art attack performance, surpassing prior multi-modal jailbreak methods. Specifically, it attains average Attack Success Rates (ASR) exceeding 90\% across three benchmark datasets on four leading commercial LVLMs.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 16747

Loading