MVFormer: Diversifying feature normalization and token mixing for efficient vision transformers

Published: 2025, Last Modified: 11 Nov 2025Pattern Recognit. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Research highlights•We propose MVFormer for diverse feature learning via token mixers and normalization.•The MVN combines three types of normalization, reflecting diverse feature distributions.•The MVTM enables stage specificity by diversifying receptive fields per stage.•Adopting both the MVN and MVTM together enhances the capacity for diverse viewpoints.•MVFormer surpass the existing convolution-based ViTs on ImageNet-1 K benchmark.
Loading