Keywords: deep learning, transfer learning, data augmentation, adversarial learning, knowledge distillation, synthetic data, VAE
TL;DR: We develop an adversarial data augmentation framework that generates synthetic inputs which help knowledge distillation to transfer useful knowledge.
Abstract: Knowledge distillation (KD) is a simple and successful method to transfer knowledge from a teacher to a student model solely based on functional activity. However, it has recently been shown that this method is unable to transfer simple inductive biases like shift equivariance. To extend existing functional transfer methods like KD, we propose a general data augmentation framework that generates synthetic data points where the teacher and the student disagree. We generate new input data through a learned distribution of spatial transformations of the original images. Through these synthetic inputs, our augmentation framework solves the problem of transferring simple equivariances with KD, leading to better generalization. Additionally, we generate new data points with a fine-tuned Very Deep Variational Autoencoder model allowing for more abstract augmentations. Our learned augmentations significantly improve KD performance, even when compared to classical data augmentations. In addition, the augmented inputs are interpretable and offer a unique insight into the properties that are transferred to the student.