Breaking BatchNorm Barriers for Noise-driven Data Free Knowledge Distillation

TMLR Paper9020 Authors

18 May 2026 (modified: 03 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Distillation using Gaussian noise is the simplest instantiation of data-free knowledge distillation: it uses no auxiliary generator, no synthesized images, and no proxy data. The idea is to sample inputs from a standard Gaussian and match teacher-student outputs. In practice, however, this approach is fragile: its behavior varies sharply across architectures and is tightly coupled to the teacher's normalization choices. In this work, we systematically study when and why Gaussian-noise distillation succeeds or fails across model families, normalization schemes, and scales from CIFAR-10 to ImageNet-100, and we identify the main factors that control its stability and effectiveness. Building on these insights, we propose NormShift-KD, a normalization-aware framework for noise-driven distillation with two instantiations tailored to the teacher's normalization: (i) for BatchNorm teachers, we pair current-statistics (CS) inference with rejection sampling to correct the class-imbalance that BN teachers exhibit on Gaussian inputs; (ii) for LayerNorm and GroupNorm teachers, we introduce a lightweight batch-alignment wrapper that restores the inter-sample coupling these per-sample normalizers lack, enabling noise-driven distillation from non-BatchNorm teachers for the first time. We further conduct a targeted BatchNorm ablation, progressively replacing BatchNorm in the teacher to map how transfer quality degrades and which components matter most, and we analyze how the student's architecture and normalization interact with the teacher (CNN/BatchNorm vs.\ Transformer/LayerNorm). Finally, we provide a theoretical explanation for the failure modes observed in ViT-style models under Gaussian-noise inputs, making noise-driven distillation more interpretable and more broadly usable.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Stefano_Sarao_Mannelli1
Submission Number: 9020
Loading