Lightweight Vision Transformers for Face Verification in the Wild

Daniel Parres, Roberto Paredes

Published: 01 Jan 2023, Last Modified: 10 Nov 2023IbPRIA 2023Readers: Everyone

Abstract: Facial verification is an important task in biometrics, particularly as it is increasingly being extended to wearable devices. It is therefore essential to study technologies that can adapt to resource-limited environments. This poses a challenge to current research trends, as the deep learning models that achieve the best results typically have millions of parameters. Specifically, convolutional neural networks (CNN) dominate the state of the art in computer vision, although transformer models have recently shown promise in tackling image problems. Vision transformers (ViT) have been successfully applied to various computer vision tasks, outperforming CNNs in classification and segmentation tasks. However, as ViT models have a high number of parameters, it is crucial to investigate their lightweight variants. The Knowledge Distillation (KD) training paradigm enables the training of student models using teacher models to transfer knowledge. In this study, we demonstrate how to train a lightweight version of ViT using KD to create one of the most competitive lightweight models in the state of the art. Our analysis of ViT models for facial verification sheds light on their suitability for resource-constrained environments such as smartphones, smartwatches, and wearables of all kinds.

0 Replies