Group Convolutional Self-Attention for Roto-Translation Equivariance in ViTs

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Roto-translation Equivariance, Group Covolutional Self-Attention, Equivariant transformers
TL;DR: Roto-Translation Equivariant ViTs without needing (equivariance preserving) position encodings.
Abstract: We propose discrete roto-translation group equivariant self-attention without position encoding using convolutional patch embedding and convolutional self-attention. We examine the challenges involved in achieving equivariance in vision transformers, and propose a simpler way to implement discretized roto-translation group equivariant vision transformers (ViTs). The experimental results demonstrate the competitive performance of our approach in comparison to the existing approaches for developing roto-translation equivariant ViTs.
Submission Number: 25
Loading