PatchSwap: A Regularization Technique for Vision Transformers

Sachin Chhabra

Published: 22 Nov 2022, Last Modified: 08 May 2024BMVC 2022EveryoneCC0 1.0

Abstract: Vision Transformers have recently gained popularity due to their superior performance on visual computing tasks. However, this performance is based on training with huge datasets, and maintaining the performance on small datasets remains a challenge. Regularization helps to alleviate the overfitting issue that is common when dealing with small datasets. Most existing regularization techniques are designed keeping ConvNets in mind. As Vision Transformers process images differently, there is a need for new regularization techniques crafted for them. In this paper, we propose a regularization called PatchSwap, which interchanges the patches between two images, resulting in a new input for regularizing the transformer. Our extensive experiments showcase that PatchSwap yields superior performance than existing state-of-the-art methods. Further, the simplicity of PatchSwap makes a straightforward extension to a semi-supervised setting with minimal effort