Fine-Grained Textual Guidance for Generalized Multi-Modal Face Anti-Spoofing

Daiyuan Li, Zitong Yu, Jinwu Hu, Guohao Chen, Jinghui Zeng, Mingkui Tan

Published: 01 Jan 2025, Last Modified: 07 Jan 2026IEEE Transactions on Information Forensics and SecurityEveryoneRevisionsCC BY-SA 4.0
Abstract: Multi-modal face anti-spoofing (FAS) is crucial for defending against presentation attacks in complex attack types and high-security scenarios. However, existing multi-modal FAS methods encounter two main limitations: 1) Most methods rely on classification supervision, which often fails to fully capture the distinctions between real faces and presentation attacks (PAs). 2) These methods depend solely on source domain data with limited PA types, leading to significant performance degradation when encountering unseen PA types and scenarios. To address these limitations, we propose a novel multi-modal fusion framework called Fine-grained Textual Guidance Multi-Modal Face Anti-Spoofing (FTG-FAS), which aligns natural language descriptions with multi-modal fused features to guide learning. Specifically, we propose a textual-guided token dropout module to select semantic invariant patch tokens for multi-modal fusion, thereby enhancing the model’s generalization capability. In the testing phase, we propose FTG-FAS++, which leverages a self-distillation scheme with online source-free adaptation to further enhance model’s performance in unseen scenarios. Specifically, we establish a teacher-student distillation framework, where the teacher model is fed with the complete image while the student model only receives masked tokens. During adaptation, we minimize the prediction discrepancy between the teacher and student in a unidirectional manner. Meanwhile, we propose a class-balanced sample selection strategy for stable source-free adaptation to prevent the model from overfitting to either real or spoof during the tuning process. Experiments show that FTG-FAS and FTG-FAS++ outperform SOTA methods by 6.91% and 8.72% in AUC on the cross-dataset leave-one-out protocols. Code will be available at https://github.com/iamcoming233/FTG-FAS.git
Loading