Patient teacher can impart locality to improve lightweight vision transformer on small dataset

Jun Ling, Xuan Zhang, Fei Du, Linyu Li, Weiyi Shang, Chen Gao, Tong Li

Published: 2025, Last Modified: 22 Jul 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•The Lightweight Vision Transformer with knowledge distillation can excel on small datasets.•Knowledge distillation combined with Curriculum Learning can enhance distillation efficiency.•Feature-based knowledge distillation can transfer locality induction bias to the lightweight Vision Transformer.•We achieve state-of-the-art performance on 8 small datasets.