- Keywords: few-shot learning, data augmentation, CutMix
- TL;DR: We achieves new state-of-the-art on few shot learning benchmarks when combining online self-distillation with CutMix augmentation.
- Abstract: Few-shot learning has been a long-standing problem in learning to learn. This problem typically involves training a model on a extremely small amount of data and testing the model on the out-of-distribution data. The focus of recent few-shot learning research has been on the development of good representation models that can quickly adapt to test tasks. To that end, we come up with a model that learns representation through online self-distillation. Our model combines supervised training with knowledge distillation via a continuously updated teacher. We also identify that data augmentation plays an important role in producing robust features. Our final model is trained with CutMix augmentation and online self-distillation. On the commonly used benchmark miniImageNet, our model achieves 67.07\% and 83.03\% under the 5-way 1-shot setting and the 5-way 5-shot setting, respectively. It outperforms counterparts of its kind by 2.25\% and 0.89\%.