$E(2)$-Equivariant Vision Transformer

Renjun Xu; Kaifan Yang; Ke Liu; Fengxiang He

$E(2)$-Equivariant Vision Transformer

Renjun Xu, Kaifan Yang, Ke Liu, Fengxiang He

Published: 08 May 2023, Last Modified: 15 Jun 2025UAI 2023Readers: Everyone

Keywords: group equivariant neural network, vision transformer, position encoding

TL;DR: We prove that previous attempts on designing group-equivariant ViT not effective in some cases, which is then addressed by a novel, effective equivariant positional encoding.

Abstract: Vision Transformer (ViT) has achieved remarkable performance in computer vision. However, positional encoding in ViT makes it substantially difficult to learn the intrinsic equivariance in data. Ini- tial attempts have been made on designing equiv- ariant ViT but are proved defective in some cases in this paper. To address this issue, we design a Group Equivariant Vision Transformer (GE-ViT) via a novel, effective positional encoding opera- tor. We prove that GE-ViT meets all the theoreti- cal requirements of an equivariant neural network. Comprehensive experiments are conducted on standard benchmark datasets, demonstrating that GE-ViT significantly outperforms non-equivariant self-attention networks. The code is available at https://github.com/ZJUCDSYangKaifan/GEVit.

Supplementary Material: pdf

Other Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/e-equivariant-vision-transformer/code)

0 Replies

Loading