Abstract: Knowledge reuse is essential for continual learning, and current methods attempt to realize it through regularization or experience replay. These two strategies have complementary strengths, e.g., regularization methods are compact, but replay methods can mimic batch training more accurately. At present, little has been done to find principled ways to combine the two methods and current heuristics can give suboptimal performance. Here, we provide a principled approach to combine and improve them by using a recently proposed principle of adaptation, where the goal is to reconstruct the “gradients of the past”, i.e., to mimic batch training by estimating gradients from past data. Using this principle, we design a prior that provably gives better gradient reconstructions by utilizing two types of replay and a quadratic weight-regularizer. This improves performance on standard benchmarks such as Split CIFAR, Split TinyImageNet, and ImageNet-1000. Our work shows that a good combination of replay and regularizer-based methods can be very effective in reducing forgetting, and can sometimes even completely eliminate it.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)
TL;DR: We propose a new, principled yet practical continual learning method that combines the complementary benefits of function-regularisation, weight-regularisation and experience replay.
5 Replies
Loading