Continual Pre-trainer is an Incremental Model Generalizer

Jaehong Yoon; Sung Ju Hwang; Yue Cao

Continual Pre-trainer is an Incremental Model Generalizer

Jaehong Yoon, Sung Ju Hwang, Yue Cao

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Masked Image Modeling, Representation Learning, Continual Learning, Unsupervised Learning, Pretraining

TL;DR: In this paper, we tackle a novel problem of Continual Pre-training, which aims to increment the generalization of model representations, encouraging positive transfer for future problems.

Abstract: With the necessity of lifelong-learnable machines over continuously changing real-world problems in practice, there has been rapid progress in continual learning these days. However, most recent works on continual learning focuses on alleviating catastrophic forgetting of a model trained over a sequence of vision tasks, considering only the performance on the tasks themselves rather than the representation transferability. In this paper, we tackle a novel problem of Continual Pre-training, which aims to increment the generalization of model representations, encouraging positive transfer for future problems. An initial empirical study shows a rather surprising finding that the transfer quality of the pre-trained model representation with both supervised and unsupervised task sequences does not show noticeable performance degradation even with full-finetuning. Furthermore, we propose a simple yet efficient Continual Pre-training method with GLobal Attention Discretization (GLAD) which introduces a new constraint to increment the global transferability of the backbone while projecting model weights to adapt to target problems via additional weight vectors. Our continual pretraining method breaks the barriers between pre-training and fine-tuning steps and leads to an integrated design that combines continual representation learning with continual learning of the task-specific learners.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

5 Replies

Loading