Understanding Diversity Based Neural Network Pruning in Teacher Student SetupDownload PDF

Mar 04, 2021 (edited Apr 01, 2021)Neural Compression Workshop @ ICLR 2021Readers: Everyone
  • Keywords: Teacher student setup, Neural network pruning, Determinantal point process
  • TL;DR: We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error bounds for comparison.
  • Abstract: Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error (GE) bounds. In the first part, we theoretically prove empirical observations of a recent work that showed Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. In the second part, we use our theoretical setup to prove that the baseline random edge pruning method performs better than the DPP node pruning method, consistent with the finding in literature that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters.
1 Reply

Loading