Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge DistillationDownload PDF

Published: 31 Oct 2022, Last Modified: 12 Jan 2023NeurIPS 2022 AcceptReaders: Everyone
Keywords: Knowledge distillation, Gaussian noise, Batch normalization
Abstract: Distillation in neural networks using only the samples randomly drawn from a Gaussian distribution is possibly the most straightforward solution one can think of for the complex problem of knowledge transfer from one network (teacher) to the other (student). If successfully done, it can eliminate the requirement of teacher's training data for knowledge distillation and avoid often arising privacy concerns in sensitive applications such as healthcare. There have been some recent attempts at Gaussian noise-based data-free knowledge distillation, however, none of them offer a consistent or reliable solution. We identify the shift in the distribution of hidden layer activation as the key limiting factor, which occurs when Gaussian noise is fed to the teacher network instead of the accustomed training data. We propose a simple solution to mitigate this shift and show that for vision tasks, such as classification, it is possible to achieve a performance close to the teacher by just using the samples randomly drawn from a Gaussian distribution. We validate our approach on CIFAR10, CIFAR100, SVHN, and Food101 datasets. We further show that in situations of sparsely available original data for distillation, the proposed Gaussian noise-based knowledge distillation method can outperform the distillation using the available data with a large margin. Our work lays the foundation for further research in the direction of noise-engineered knowledge distillation using random samples.
TL;DR: An approach to show that data-free knowledge distillation can be done using only the samples randomly drawn from a standard Gaussian distribution.
Supplementary Material: pdf
11 Replies