Patient teacher can impart locality to improve lightweight vision transformer on small dataset

Published: 01 Jan 2025, Last Modified: 22 Jan 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The Lightweight Vision Transformer with knowledge distillation can excel on small datasets.•Knowledge distillation combined with Curriculum Learning can enhance distillation efficiency.•Feature-based knowledge distillation can transfer locality induction bias to the lightweight Vision Transformer.•We achieve state-of-the-art performance on 8 small datasets.
Loading