Certified Defense Against Cross-Modal Attacks in Multimodal LLMs via Semantic-Perceptual Abstractions
Abstract: Multimodal large language models (MLLMs) have revolutionized AI by enabling seamless
integration of vision and language understanding across diverse applications, from visual
question answering to image captioning. However, their cross-modal architecture introduces unique vulnerabilities to adversarial perturbations that exploit both text and image
modalities simultaneously. While existing defense mechanisms rely on empirical robustness through adversarial training, they lack formal guarantees against sophisticated crossmodal attacks. This paper introduces a novel certified defense framework based on hybrid
polytope-zonotope abstractions that provides provable robustness guarantees for MLLMs.
Our approach unifies discrete text perturbations with continuous image perturbations within
a single mathematical framework Extensive evaluation on VQA v2.0 and Flickr30k across
MLLMS demonstrates 88.5% clean accuracy with 76.4–81.2% certified accuracy under large
perturbations, outperforming state-of-the-art baselines by 8.3% in certification rate and
6.7% in joint attack defense.This work establishes the first comprehensive certified defense
for MLLMs, advancing trustworthy multimodal AI systems.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=jYEojAfK0m
Changes Since Last Submission: The format is updated to the latest version.
Assigned Action Editor: ~Jonathan_Ullman1
Submission Number: 6144
Loading