LDCformer: Incorporating Learnable Descriptive Convolution to Vision Transformer for Face Anti-Spoofing

Pei-Kai Huang, Cheng-Hsuan Chiang, Jun-Xiong Chong, Tzu-Hsien Chen, Hui-Yu Ni, Chiou-Ting Hsu

Published: 01 Jan 2023, Last Modified: 12 Apr 2024ICIP 2023Readers: Everyone

Abstract: Face anti-spoofing (FAS) aims to counter facial presentation attacks and heavily relies on identifying live/spoof discriminative features. While vision transformer (ViT) has shown promising potential in recent FAS methods, there remains a lack of studies examining the values of incorporating local descriptive feature learning with ViT. In this paper, we propose a novel LDCformer by incorporating Learnable Descriptive Convolution (LDC) with ViT and aim to learn distinguishing characteristics of FAS through modeling long-range dependency of locally descriptive features. In addition, we propose to extend LDC to a Decoupled Learnable Descriptive Convolution (Decoupled-LDC) for improving the optimization efficiency. With the new Decoupled-LDC, we further develop an extended model LDCformer <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">D</sup> for FAS. Extensive experiments on FAS benchmarks show that LDCformer <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">D</sup> outperforms previous methods on most of the protocols in both intra-domain and cross-domain testings. The codes are available at https://github.com/Pei-KaiHuang/ICIP23_D-LDCformer.

0 Replies