Dual PatchNorm

Manoj Kumar; Mostafa Dehghani; Neil Houlsby

Dual PatchNorm

Manoj Kumar, Mostafa Dehghani, Neil Houlsby

Published: 09 May 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Authors that are also TMLR Expert Reviewers: ~Neil_Houlsby1

Abstract: We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments on image classification and contrastive learning, incorporating this trivial modification, often leads to improved accuracy over well-tuned vanilla Vision Transformers and never hurts.

Certifications: Expert Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: * [pmpk, thex]: Section 6.3, Semantic Segmentation * [pmpk, thex]: Section 6.1, VTAB Finetuning * [pmpk]: Table 1, DeIT + AugReg, High Res * [pmpk]: Table 2, JFT Finetune Fixes * [pmpk, thex, owpk]: Section 8.1, Gradient Norm Scale * [thex]: Hyperparameters in Appendix.

Assigned Action Editor: ~Yunhe_Wang1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 848

Loading