ReViT: Enhancing vision transformers feature diversity with attention residual connections

Anxhelo Diko, Danilo Avola, Marco Cascio, Luigi Cinque

Published: 01 Dec 2024, Last Modified: 07 Nov 2025Pattern RecognitionEveryoneRevisionsCC BY-SA 4.0
Abstract: Highlights•Vision transformers suffer from feature collapsing in deeper layers.•Residual attention contrast feature collapsing.•Vision transformers with residual attention learn better representations.•Residual attention improves the ViT’s performance in visual recognition tasks.
Loading