Soup to go: mitigating forgetting during continual learning with model averaging

Soup to go: mitigating forgetting during continual learning with model averaging

TMLR Paper5128 Authors

16 Jun 2025 (modified: 08 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earlier tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? We propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. Our method outperforms SOTA merging, and penalty methods, and achieves comparable performance to rehearsal with just a data buffer. In turn, our method offers insight into the benefits of merging partially trained models during training across both image and language domains.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Elahe_Arani1

Submission Number: 5128

Loading