Certified Defense Against Cross-Modal Attacks in Multimodal LLMs via Semantic-Perceptual Abstractions

TMLR Paper6144 Authors

08 Oct 2025 (modified: 11 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal large language models (MLLMs) have revolutionized AI by enabling seamless integration of vision and language understanding across diverse applications, from visual question answering to image captioning. However, their cross-modal architecture introduces unique vulnerabilities to adversarial perturbations that exploit both text and image modalities simultaneously. While existing defense mechanisms rely on empirical robustness through adversarial training, they lack formal guarantees against sophisticated crossmodal attacks. This paper introduces a novel certified defense framework based on hybrid polytope-zonotope abstractions that provides provable robustness guarantees for MLLMs. Our approach unifies discrete text perturbations with continuous image perturbations within a single mathematical framework Extensive evaluation on VQA v2.0 and Flickr30k across MLLMS demonstrates 88.5% clean accuracy with 76.4–81.2% certified accuracy under large perturbations, outperforming state-of-the-art baselines by 8.3% in certification rate and 6.7% in joint attack defense.This work establishes the first comprehensive certified defense for MLLMs, advancing trustworthy multimodal AI systems.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=jYEojAfK0m
Changes Since Last Submission: The format is updated to the latest version.
Assigned Action Editor: ~Jonathan_Ullman1
Submission Number: 6144
Loading