Abstract: The posterior estimation of parameters based on Bayesian theory is a crucial technique in Incremental Learning (IL). The estimated posterior is typically utilized to impose loss regularization, which aligns the current training model parameters with the previously learned posterior to mitigate catastrophic forgetting, a major challenge in IL. However, this additional loss regularization can also impose detriment to the model learning, preventing it from reaching the true global optimum. To overcome this limitation, this paper introduces a novel Bayesian IL framework, Robust Parameter Posterior Fusion (RP$^2$F). Unlike traditional methods, RP$^2$F directly estimates the parameter posterior for new data without introducing extra loss regularization, which allows the model to accommodate new knowledge more sufficiently. It then fuses this new posterior with the existing ones based on the Maximum A Posteriori (MAP) principle, ensuring effective knowledge sharing across tasks. Furthermore, RP$^2$F incorporates a common parameter-robustness priori to facilitate a seamless integration during posterior fusion. Comprehensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets show that RP$^2$F not only effectively mitigates catastrophic forgetting but also achieves backward knowledge transfer.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: The Bayesian approach is a widely utilized technique in incremental learning (IL). Central to this approach is the addition of loss regularization, which aligns the current training model parameters with previously learned posteriors to mitigate catastrophic forgetting, a major challenge in IL. However, this additional loss regularization can also impose detriment to the model learning, preventing it from reaching the true global optimum. To overcome this limitation, this paper introduces a novel Bayesian IL framework, Robust Parameter Posterior Fusion (RP$^2$F). Unlike traditional methods, RP$^2$F directly estimates the parameter posterior for new data without introducing extra loss regularization, which allows the model to accommodate new knowledge more sufficiently. It then fuses this new posterior with the existing ones based on the Maximum A Posteriori (MAP) principle, ensuring effective knowledge sharing across tasks. Furthermore, RP$^2$F incorporates a common parameter-robustness priori to facilitate a seamless integration during posterior fusion. Comprehensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets show that RP$^2$F not only effectively mitigates catastrophic forgetting but also achieves backward knowledge transfer.
Supplementary Material: zip
Submission Number: 2836
Loading