Knowledge Distillation of Class Activation Maps from Two Teachers for Continual Learning

Minkai Sheng, Hyung-Jun Moon, Sung-Bae Cho

Published: 01 Jan 2025, Last Modified: 05 Nov 2025HAIS (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Continual learning is to enable neural networks to sequentially acquire new tasks without catastrophic forgetting. However, limited representational capacity of machine learning often impairs knowledge retention for past tasks. In this paper, we propose a novel continual learning method that utilizes knowledge distillation from two teachers, one responsible for past tasks and the other for present task. The proposed method distills logits and class activation maps of the two teachers to a student model, which effectively alleviates catastrophic forgetting while facilitating continuous acquisition of diverse task representations. This allows the student model to learn the entire knowledge of tasks by integrating and reinforcing knowledge of past and present tasks. Experiments on CIFAR-100 and mini-ImageNet under constrained memory and training time demonstrate that the proposed method achieves state-of-the-art performance with average accuracies of 62.91% and 71.16%, respectively. Analysis on the role of internal representations establishes the proposed method as a lightweight yet effective solution for continual learning with limited computational resources.

External IDs:dblp:conf/hais/ShengMC25