In-Context Learning as Conditioned Associative Memory Retrieval

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We provide an exactly solvable example for interpreting In-Context Learning (ICL) with one-layer attention models as conditional retrieval of dense associative memory models.
Abstract: We provide an exactly solvable example for interpreting In-Context Learning (ICL) with one-layer attention models as conditional retrieval of dense associative memory models. Our main contribution is to interpret ICL as memory reshaping in the modern Hopfield model from a conditional memory set (in-context examples). Specifically, we show that the in-context sequential examples induce an effective reshaping of the energy landscape of a Hopfield model. We integrate this in-context memory reshaping phenomenon into the existing Bayesian model averaging view of ICL [Zhang et al., AISTATS 2025] via the established equivalence between the modern Hopfield model and transformer attention. Under this unique perspective, we not only characterize how in-context examples shape predictions in the Gaussian linear regression case, but also recover the known $\epsilon$-stability generalization bound of the ICL for the one-layer attention model. We also give explanations for three key behaviors of ICL and validate them through experiments.
Lay Summary: Large language models like ChatGPT can solve new problems just by being shown a few examples in a prompt. We are curious about how these models manage to “learn” so quickly without updating their internal parameters, and whether there’s a simple explanation behind this surprising behavior. We found that this process can be understood as a kind of memory retrieval. Specifically, we use a classic brain-inspired model called a Hopfield network to show how each example in the prompt subtly reshapes what the model “remembers.” This reshaping helps the model focus on the most relevant information for making predictions — just like how a person recalls different memories depending on the question they’re asked. To test this idea, we build a simplified version of a language model and run experiments with it. Our results confirm that in-context learning is stronger when the examples are similar to the test case, accurate, and drawn from a familiar setting.
Primary Area: Deep Learning->Foundation Models
Keywords: In-Context Learning, Large Language Model, Foundation Model, Transformer, Attention, Modern Hopfield Model, Associative Memory
Submission Number: 839
Loading