Keywords: Life-long continual learning, K-prior, Compact memory, Hessian matching
Abstract: Despite recent progress, continual lifelong learning cannot yet match the performance of batch learning. This is partly because the regularization methods used for continual learning are not as effective as the stochastic gradients sampled from the whole data. Replay methods reconstruct past gradients and can work better but the memory-buffer size can grow quite large with the number of tasks and make the method slow.
Here, we propose a new method to build a compact memory to accurately reconstruct the past gradients. We use the framework by Khan and Swaroop (2021) who prove the existence of optimal memory to perfectly reconstruct the gradients. We show that, for linear regression, the optimal memory is obtained by Hessian matching and use this to propose an extension to logistic regression by using the probabilistic PCA method. We confirm our findings on small-scale classification problems. Overall, we hope to encourage future research on compact memory for continual learning.
Submission Number: 32
Loading