Sufficient Vision Transformer

Zhi Cheng, Xiu Su, Xueyu Wang, Shan You, Chang Xu

2022 (modified: 22 Sept 2022)KDD 2022Readers: Everyone

Abstract: Currently, Vision Transformer (ViT) and its variants have demonstrated promising performance on various computer vision tasks. Nevertheless, task-irrelevant information such as background nuisance and noise in patch tokens would damage the performance of ViT-based models. In this paper, we develop Sufficient Vision Transformer (Suf-ViT) as a new solution to address this issue. In our research, we propose the Sufficiency-Blocks (S-Blocks) to be applied across the depth of Suf-ViT to disentangle and discard task-irrelevant information accurately. Besides, to boost the training of Suf-ViT, we formulate a Sufficient-Reduction Loss (SRLoss) leveraging the concept of Mutual Information (MI) that enables Suf-ViT to extract more reliable sufficient representations by removing task-irrelevant information. Extensive experiments on benchmark datasets such as ImageNet, ImageNet-C, and CIFAR-10 indicate that our method can achieve state-of-the-art or competing performance over other baseline methods. Codes are available at: https://github.com/zhicheng2T0/Sufficient-Vision-Transformer.git

0 Replies