Abstract: Vision Transformers have recently gained popularity due to their superior performance on visual computing tasks. However, this performance is based on training with
huge datasets, and maintaining the performance on small datasets remains a challenge.
Regularization helps to alleviate the overfitting issue that is common when dealing with
small datasets. Most existing regularization techniques are designed keeping ConvNets
in mind. As Vision Transformers process images differently, there is a need for new regularization techniques crafted for them. In this paper, we propose a regularization called
PatchSwap, which interchanges the patches between two images, resulting in a new input for regularizing the transformer. Our extensive experiments showcase that PatchSwap
yields superior performance than existing state-of-the-art methods. Further, the simplicity of PatchSwap makes a straightforward extension to a semi-supervised setting with
minimal effort
Loading