Abstract: Class-incremental learning poses a significant challenge under an exemplar-free constraint, leading to catastrophic forgetting and sub-par incremental accuracy. Previous attempts have focused primarily on single-modality tasks, such as image classification or audio event classification. However, in the context of Audio-Visual Class-Incremental Learning (AVCIL), the effective integration and utilization of heterogeneous modalities, with their complementary and enhancing characteristics, remains largely unexplored. To bridge this gap, we propose the Multi-Modal Analytic Learning (MMAL) framework, an exemplar-free solution for AVCIL that employs a closed-form, linear approach. To be specific, MMAL introduces a modality fusion module that re-formulates the AVCIL problem through a Recursive Least-Square (RLS) perspective. Complementing this, a Modality-Specific Knowledge Compensation (MSKC) module is designed to further alleviate the under-fitting limitation intrinsic to analytic learning by harnessing individual knowledge from audio and visual modality in tandem. Comprehensive experimental comparisons with existing methods show that our proposed MMAL demonstrates superior performance with the accuracy of 76.71%, 78.98%, and 76.19% on AVE, Kinetics-Sounds, and VGGSounds100 datasets, respectively, setting new state-of-the-art AVCIL performance. Notably, compared to those memory-based methods, our MMAL, being an exemplar-free approach, provides good data privacy and can better leverage multi-modal information for improved incremental accuracy.
Loading