Style-conditional Prompt Token Learning for Generalizable Face Anti-spoofing

Jiabao Guo, Huan Liu, Yizhi Luo, Xueli Hu, Hang Zou, Yuan Zhang, Hui Liu, Bo Zhao

Published: 01 Jan 2024, Last Modified: 01 Mar 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Face anti-spoofing (FAS) based on domain generalization (DG) has attracted increasing attention from researchers. The reason for the poor generalization is that the model is overfitted to salient liveness-irrelevant signals. However, the previous methods alleviate the overfitting by mapping the images from multiple domains into a common feature space or promoting the separation of image features from domain-specific features and task-related features. If the text features of vision-language pre-trained (VLP) models (e.g., CLIP) are used to dynamically adjust the image features to gain a better generalization, we can not only explore a wider feature space but also avoid the potential degradation of semantic information. Specifically, we propose a FAS method of Style-Conditional Prompt Token Learning (S-CPTL), which aims to generate generalized text features by training the introduced prompt tokens to carry visual styles and use them as weights for classifiers to improve the model's generalization. Compared to the inherently static prompt token, we propose the dynamic prompt token, which can adaptively capture live-irrelevant signals from the instance-specific styles and increase their diversity through mixed feature statistics to further reduce the overfitting of the model. Thorough experimental analysis demonstrates that S-CPTL exceeds current top-performing methods in four distinct cross-dataset benchmarks.