Associative Memories with Heavy-Tailed Data

Published: 23 Oct 2023, Last Modified: 13 Nov 2023HeavyTails 2023EveryoneRevisionsBibTeX
Keywords: associative memory, scaling law, Zipf data, optimization-based algorithm, mechanistic interpretability
TL;DR: Scaling law for associative memory model with Zipf data + optimization study
Abstract: Learning arguably involves the discovery and memorization of abstract rules. But how associative memories appear in transformer architectures optimized with gradient descent algorithms? We derive precise scaling laws for a simple input-output associative memory model with respect to parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.
Submission Number: 9