Manifold Learning for Adversarial Robustness: A Geometric Defense Framework for Vision-Language Models

TMLR Paper6179 Authors

12 Oct 2025 (modified: 15 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal large language models (MLLMs) remain vulnerable to adversarial attacks that simultaneously manipulate image inputs and textual queries. Contemporary defense strate gies rely on expensive adversarial training requiring attack generation during optimization, while lacking principled mathematical characterizations of the geometric manifold structure where multimodal embeddings reside. We introduce a Riemannian geometric framework that learns metric tensors to characterize clean feature geometry, detects adversarial per turbations via Ricci curvature analysis, corrects features through geodesic projection along shortest manifold paths, and suppresses adversarial regions using curvature-based attention mechanisms. Our approach provides defense through learned geometric invariants rather than memorized attack patterns, eliminating adversarial training requirements. Evalua tion on VQA v2.0 across CLIP demonstrates 72.1% clean accuracy with 42.1–67.5% robust accuracy under diverse attacks including TextBugger, BAE, PGD-L∞, AutoAttack, and joint multimodal attacks, outperforming adversarial training baselines by 8.3–22.5% while requiring zero attack examples during training. Our framework establishes the first princi pled geometric approach to MLLM robustness, demonstrating that understanding manifold structure provides superior defense compared to attack memorization
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Chao_Chen1
Submission Number: 6179
Loading