Abstract: The generalization capability of a face anti-spoofing (FAS) model is critical to its practicality in the real world. Recent studies have theoretically and empirically uncovered that neural networks tend to exploit easy-to-learn frequency sets for decisions. These simplicity-biased representations, depending on what best simplifies the training objective, may hamper generalization. This paper thus focuses on mitigating the frequency shortcut learning of prior FAS models for improved generalization. Specifically, we introduce a frequency-aware autoencoder to retain more frequency details in intermediate features via reconstruction, facilitating comprehensive judgment of FAS. Based on the encoder output, we propose a dynamic frequency masking mechanism to select and suppress the probable shortcut bands during training, enabling broader horizons on under-explored frequencies. Moreover, we employ a style inhibited modulation to weaken stylized information in frequency space to reduce the reliance on spurious style features. Experiment results on generalized FAS benchmarks verify the superiority of our framework over existing methods. Our code has been integrated into this project: https://github.com/VISION-SJTU/UniDefense.
Loading