Language Models as Feature Extractors for Accurate Continual Learning

Saleh Momeni; Sahisnu Mazumder; Bing Liu

Language Models as Feature Extractors for Accurate Continual Learning

Saleh Momeni, Sahisnu Mazumder, Bing Liu

28 Sept 2024 (modified: 18 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: class incremental learning, continual learning with LMs, distance-based methods

Abstract: This paper addresses the challenges of class incremental learning (CIL) within the broader context of continual learning. In CIL, a system learns a sequence of tasks or classes incrementally. The resulting classifier can categorize test samples into any learned class thus far without relying on task-specific information during testing. CIL presents two significant challenges: catastrophic forgetting (CF) and inter-task class separation (ICS). ICS occurs because the system lacks data from previous tasks when learning new ones, making it harder to establish decision boundaries between classes, reducing accuracy. This paper proposes a novel method to overcome both CF and ICS. The basic classifier is based on the statistical technique Mahalanobis distance (MD), which measures the distance of a data point to a normal distribution. In the proposed approach, each class is represented by a normal distribution with the mean and covariance derived from the features of its training data, which are extracted from a language model (LM). To reduce storage, all classes share a common covariance matrix. Two additional techniques are also proposed to enhance the accuracy: (1) using a kernel function to expand the feature space, and (2) incorporating an ensemble mechanism. Our experiments show that the proposed method achieves accuracy comparable to the upper bound accuracy of joint fine-tuning, which, to our knowledge, has not been achieved before.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12880

Loading