Abstract: Highlights•The Lightweight Vision Transformer with knowledge distillation can excel on small datasets.•Knowledge distillation combined with Curriculum Learning can enhance distillation efficiency.•Feature-based knowledge distillation can transfer locality induction bias to the lightweight Vision Transformer.•We achieve state-of-the-art performance on 8 small datasets.
Loading