C-Flat Turbo: A Faster Path to Continual Learning

18 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: continual learning; flat sharpness; sharpness-aware minimization
Abstract: Continual Learning (CL) aims to train neural networks on a dynamic task stream without forgetting previously learned knowledge. With the rise of pre-training techniques, strong model generalization has become essential for stable learning. C-Flat is a powerful and general CL training regime that promotes generalization by seeking flatter optima across sequential tasks. However, it requires three additional gradient computations per step, resulting in up to 4× computational overhead. In this work, we propose C-Flat Turbo, a faster yet stronger optimizer with a relaxed scheduler, to substantially reduce training cost. We disclose that gradients toward first-order flatness contain direction-invariant components with respect to the proxy model at $\theta + \epsilon_1^*$, which allows us to skip redundant gradient computations in the perturbed ascent steps. Furthermore, a stage-wise step scheduler and adaptive triggering of the regularization mechanism enable dynamic control of C-Flat behavior throughout training. Experiments demonstrate that our optimizer accelerates most CL methods by at least 1$\times$ (up to 1.25$\times$) over C-Flat, while achieving better performance. Code will be released upon acceptance.
Supplementary Material: pdf
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 12146
Loading