SupMem: Support Memorization for Semiparametric Language Models
Keywords: NLP, semiparametric model, language model, continual learning, scalability, support vectors, data compression, kNN-LM
Abstract: Semiparametric language models (Semi-LMs) combine the advantages of neural language models and non-parametric memory, making them suitable for continual learning from dynamic text data with little catastrophic forgetting. However, their scalability is limited due to the linear increase in memory cost with data and model size. To mitigate this scalability issue, we propose Support Memorization (SupMem) by only memorizing support entries, inspired by the concept of support vectors in SVMs and the equivalence between Transformers and SVMs in terms of optimization geometry \citep{tarzanagh_transformers_2023,tarzanagh_margin_2023}. We first present a novel perspective on support entry identification, modeling ResMem as an optimization problem that maximizes support entry expectation with a constraint on the number of support entries. Then, we provide theoretical analyses to propose a feasible approximate solution in practice. Experimental results show that ResMem achieves the same performance as the conventional full memorization method with much-reduced memory consumption, demonstrating its effectiveness in improving both data and model scalability for continual learning in Semi-LMs.
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9143
Loading