Keywords: Class-Incremental Semantic Segmentation; Continual learning
TL;DR: This paper proposes Distribution-based Knowledge Distillation (DKD), a minimization–maximization distribution strategy. The strategy enables the average performance to approach the upper bound in class-incremental semantic segmentation.
Abstract: Class-incremental semantic segmentation aims to progressively learn new classes while preserving previously acquired knowledge. This task becomes particularly challenging when prior training samples are unavailable due to data privacy or storage restrictions, resulting in catastrophic forgetting. To address this issue, knowledge distillation is widely adopted as a constraint by maximizing the similarity of representations between the current model (learning new classes) and the previous model (retaining old ones). However, knowledge distillation inherently preserves the old-knowledge distribution with minimal modification. This constraint limits the parameters available for learning new classes when substantial information from old classes is retained. Furthermore, the acquired old knowledge is often ignored to facilitate the learning of new knowledge, resulting in a waste of previously learned procedures. The above two problems result in the risk of class confusion and deviating from the performance of joint learning. Based on such analysis, we propose Distribution-based Knowledge Distillation (DKD) via a minimization--maximization distribution strategy. On the one hand, to alleviate the parameter competition between old and new knowledge, we minimize the distribution of old knowledge after releasing low-sensitivity parameters to old classes. On the other hand, to effectively utilize the valuable knowledge previously acquired, we maximize shared-knowledge distribution between the old and new knowledge after approximating the new knowledge distribution via Laplacian-based projection estimation. The proposed method achieves an excellent balance between stability and plasticity in nine diverse settings on Pascal VOC and ADE20K. Notably, its average performance approaches that of joint learning (upper bound) while effectively reducing class confusion. The source code is provided in the supplementary material and will be made publicly available upon acceptance.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 15185
Loading