Knowledge Distillation for Random Data: Soft Labels and Similarity Scores May Contain Memorized Information

Published: 06 Mar 2025, Last Modified: 27 Mar 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: regular paper (up to 6 pages)
Keywords: knowledge distillation, memorization, neural networks, spurious correlations, model transfer, neural mnemonics
TL;DR: Neural networks can transfer purely memorized data through knowledge distillation, even on random datasets.
Abstract: This work reexamines conventional views of how neural networks store and transfer memorized information by investigating knowledge distillation for random, unstructured data. While knowledge distillation typically focuses on transferring generalizable patterns, we demonstrate that teacher models can encode and transfer purely memorized associations on finite random i.i.d. datasets. Through systematic experiments with fully connected networks, we show that students trained on teacher logits or embedding similarities achieve non-trivial accuracy on memorized data they never directly observed. This phenomenon persists across varying network capacities, dataset compositions, and even with randomized real-world data. Our findings encourage moving beyond simple key-value views of memory in neural networks, and highlight the role of spurious yet learnable patterns that transfer across models.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Freya_Behrens1
Format: No, the presenting author is unable to, or unlikely to be able to, attend in person.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 33
Loading