Generative Multi-modal Models are Good Class-Incremental Learners

Xusheng Cao, Haori Lu, Linlan Huang, Xialei Liu, Ming-Ming Cheng

Published: 01 Jan 2024, Last Modified: 03 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In class-incremental learning (CIL) scenarios, the phe-nomenon of catastrophic forgetting caused by the classi-fier's bias towards the current task has long posed a signif-icant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL How-ever, transitioning from discriminative to generative mod-els requires addressing two key challenges. The primary challenge lies in transferring the generated textual infor-mation into the classification of distinct categories. Ad-ditionally, it requires formulating the task of CIL within a generative framework. To this end, we propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates la-bels for images using an adapted generative model. After obtaining the detailed text, we use a text encoder to ex-tract text features and employ feature matching to deter-mine the most similar label as the classification prediction. In the conventional CIL settings, we achieve signifi-cantly better results in long-sequence task scenarios. Un-der the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting. Our code is available at https://github.com/DoubleClass/GMM.