Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge

Freya Behrens; Lenka Zdeborova

Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge

Freya Behrens, Lenka Zdeborova

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: knowledge distillation, dataset distillation, memorization, learning theory, model transfer, shortcut learning

TL;DR: Neural networks can transfer purely memorized data through knowledge distillation, even on random datasets.

Abstract: Dataset and knowledge distillation transfer capabilities between models. Their efficiency is often linked to structure in the data. However, next to general skills, modern neural networks encode specific facts, but if and how such memorized information is transferred remains less understood. To analyze the transfer of memorized information in isolation, we consider finite random i.i.d. datasets where generalization is a priori impossible and a successful teacher fit implies pure memorization. Yet, we show that students can learn non-trivial accuracy on held out memorized teacher data they never directly observed - in some cases up to perfect accuracy. This notebook showcases this phenomenon in three different contexts, and sets up the framework required for a deeper empirical and theoretical analysis.

Code: zip

Submission Number: 26

Loading