ResMem: Learn what you can and memorize the rest

Zitong Yang; Michal Lukasik; Vaishnavh Nagarajan; Zonglin Li; Ankit Singh Rawat; Manzil Zaheer; Aditya Krishna Menon; Sanjiv Kumar

ResMem: Learn what you can and memorize the rest

Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: deep learning, generalization, memorization, deep learning theory, boosting, nearest neighbor

TL;DR: We introduce residual-memorization (ResMem), a simple yet effective algorithm that wraps around a prediction model and improves its test accuracy, verified empirically and theoretically.

Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g., a neural network) by fitting the model's residuals with a nearest-neighbor based regressor. The final prediction is then the sum of the original model and the fitted residual regressor. By construction, ResMem can explicitly memorize the training labels. We start by formulating a stylized linear regression problem and rigorously show that ResMem results in a more favorable test risk over a base linear neural network. Then, we empirically show that ResMem consistently improves the test set generalization of the original prediction model across standard vision and natural language processing benchmarks.

Supplementary Material: pdf

Submission Number: 3155

Loading