Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual LearningDownload PDF

06 Oct 2022 (modified: 05 May 2023)INTERPOLATE at NeurIPS 2022Readers: Everyone
Keywords: continual learning, foundation models, generalization
TL;DR: We showcase that momentum-based weight-space interpolation has particular advantages for the adaptation of foundation models to continual learning.
Abstract: Large pretrained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent finetuning can considerably improve performance on a selected downstream task. However, through naive finetuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where finetuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
0 Replies

Loading