2022 (modified: 18 Nov 2022)ICLR 2022Readers: Everyone
Abstract:Vision Transformer (ViT) and its variants (e.g., Swin, PVT) have achieved great success in various computer vision tasks, owing to their capability to learn long-range contextual information. Layer...