Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
Keywords: knowledge distillation, dataset distillation, memorization, learning theory, model transfer, shortcut learning
TL;DR: Neural networks can transfer purely memorized data through knowledge distillation, even on random datasets.
Abstract: Dataset and knowledge distillation transfer capabilities between models.
Their efficiency is often linked to structure in the data.
However, next to general skills, modern neural networks encode specific facts, but if and how such memorized information is transferred remains less understood.
To analyze the transfer of memorized information in isolation, we consider finite random i.i.d. datasets where generalization is a priori impossible and a successful teacher fit implies pure memorization.
Yet, we show that students can learn non-trivial accuracy on held out memorized teacher data they never directly observed - in some cases up to perfect accuracy.
This notebook showcases this phenomenon in three different contexts, and sets up the framework required for a deeper empirical and theoretical analysis.
Code: zip
Submission Number: 26
Loading