Manifold Learning for Adversarial Robustness: A Geometric Defense Framework for Vision-Language Models
Abstract: Multimodal large language models (MLLMs) remain vulnerable to adversarial attacks that
simultaneously manipulate image inputs and textual queries. Contemporary defense strate
gies rely on expensive adversarial training requiring attack generation during optimization,
while lacking principled mathematical characterizations of the geometric manifold structure
where multimodal embeddings reside. We introduce a Riemannian geometric framework
that learns metric tensors to characterize clean feature geometry, detects adversarial per
turbations via Ricci curvature analysis, corrects features through geodesic projection along
shortest manifold paths, and suppresses adversarial regions using curvature-based attention
mechanisms. Our approach provides defense through learned geometric invariants rather
than memorized attack patterns, eliminating adversarial training requirements. Evalua
tion on VQA v2.0 across CLIP demonstrates 72.1% clean accuracy with 42.1–67.5% robust
accuracy under diverse attacks including TextBugger, BAE, PGD-L∞, AutoAttack, and
joint multimodal attacks, outperforming adversarial training baselines by 8.3–22.5% while
requiring zero attack examples during training. Our framework establishes the first princi
pled geometric approach to MLLM robustness, demonstrating that understanding manifold
structure provides superior defense compared to attack memorization
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Chao_Chen1
Submission Number: 6179
Loading