Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing

Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing

ICLR 2026 Conference Submission16072 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Face anti-spoofing, Multimodal

Abstract: Face anti-spoofing is essential for securing face recognition in applications such as payments, border control, and surveillance. Current multimodal methods often degrade under domain shift and modality bias. We address these challenges with a Multimodal Denoising and Alignment framework (MMDA) built around two threads, denoising and alignment. Using a pretrained CLIP backbone, the Modality–Domain Joint Differential Attention (MD2A) module suppresses modality noise and domain noise at fusion to produce cleaner representations that lay the groundwork for alignment. The Representation Space Soft alignment (RS2) then maps the fused representation to text-defined class subspaces rather than a single prompt, preserving semantics while improving class separability and cross-domain consistency. Finally, the U-shaped Dual Space Adaptation (U-DSA) applies alignment across layers and feeds deep information back to shallow layers, preserving pretrained semantics while adding task-specific capacity. The three components act jointly: denoising stabilizes alignment, alignment tightens decision boundaries, and U-DSA consolidates and propagates these gains, yielding stronger multimodal domain generalization. MMDA attains state-of-the-art results on four public datasets under multiple protocols: under the complete modality setting, it reduces HTER by 9.63\% and increases AUC by 5.98\% over the strongest prior.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16072

Loading