Data-efficient multi-scale fusion vision transformer

Hao Tang, Dawei Liu, Chengchao Shen

Published: 2025, Last Modified: 15 May 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We improve the data efficiency of ViT on small datasets.•Our method incorporates multi-scale tokens within global self-attention of ViT.•Our approach enables regionally cross-scale interaction through multi-scale fusion.•We introduce a novel data augmentation schedule in the training phase.•Experiments demonstrate the outperformance and data efficiency of our method.