Right Time to Learn: Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We draw inspirations from the spacing effect of biological learning and propose a new paradigm of knowledge distillation to improve generalization.
Abstract: Knowledge distillation (KD) is a powerful strategy for training deep neural networks (DNNs). While it was originally proposed to train a more compact “student” model from a large “teacher” model, many recent efforts have focused on adapting it as an effective way to promote generalization of the model itself, such as online KD and self KD. Here, we propose an easy-to-use and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. This strategy is inspired by a prominent theory named spacing effect in the field of biological learning and memory, positing that appropriate intervals between learning trials can significantly enhance learning performance. We provide an in-depth theoretical and empirical analysis showing that the benefits of the proposed spacing effect in KD stem from seeking a flat minima during stochastic gradient descent (SGD). We perform extensive experiments to demonstrate the effectiveness of our Spaced KD in improving the learning performance of DNNs (e.g., the additional performance gain is up to 2.31% and 3.34% on Tiny-ImageNet over online KD and self KD, respectively). Our codes have been released on github~\url{https://github.com/SunGL001/Spaced-KD}.
Lay Summary: How can we help AI systems learn more effectively and generalize better — even in unfamiliar situations? Inspired by how humans and animals learn better when study sessions are spaced out over time, we propose a new way to train AI called Spaced Knowledge Distillation. This method introduces short delays between the updates of a “teacher” model and a “student” model, mimicking the benefits of spaced learning in biology. By carefully timing when the student learns from the teacher, our method encourages the student model to settle into more stable and reliable learning patterns. This results in better performance on real-world tasks and stronger resistance to noisy or unexpected data. Our approach works with existing training techniques, doesn’t add extra cost, and consistently improves performance across different AI models and datasets. It shows that in both brains and machines, when you learn matters just as much as what you learn.
Link To Code: https://github.com/SunGL001/Spaced-KD.git
Primary Area: General Machine Learning->Supervised Learning
Keywords: Knowledge Distillation, Brain-inspired Al, Machine Learning, Spacing effect
Submission Number: 5904
Loading