Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Minhyuk Seo; Hyunseo Koh; Jonghyun Choi

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Minhyuk Seo, Hyunseo Koh, Jonghyun Choi

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: continual learning, constraint, layer freezing, efficient learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Majority of online continual learning (CL) places restrictions on the size of replay memory and a single-epoch training to ensure a prompt update of the model. However, the single-epoch training may imply a different amount of computations per CL algorithm, and additional storage for storing logit or model in addition to replay memory is largely ignored as a storage budget. Here, we used floating point operations (FLOPs) and total memory size in Byte as a metric for computational and memory budgets, respectively, to compare CL algorithms with the same total budget. Interestingly, we found that the new and advanced algorithms often perform worse than simple baselines under the same budget, implying that their value is less beneficial in real-world deployment. To improve the accuracy of online continual learners in the same budget, we propose an adaptive layer freezing and frequency-based memory retrieval for episodic memory usage for a storage- and computationally-efficient online CL algorithm. The proposed adaptive layer freezing does not update the layers for less informative batches to reduce computational cost with a negligible loss of accuracy. The proposed memory retrieval balances the training usage count of samples in episodic memory with a negligible computational and memory cost. In extensive empirical validations using CIFAR-10/100, CLEAR-10, and ImageNet-1K datasets, we demonstrate that the proposed method outperforms the state-of-the-art in the same total budget.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3264

Loading