Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: associative memory, discrete diffusion modeling, language modeling, memorization, generalization, retrieval
TL;DR: Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Test Data
Abstract: When do language diffusion models memorize their training data, and how to quantitatively assess their true generative regime? We address these questions by establishing that Uniform-based Discrete Diffusion Models (UDDMs) fundamentally behave as Associative Memories (AMs) ***with emergent creative capabilities***. The core idea of an AM is to reliably recover stored data points as $\textit{memories}$ by establishing distinct basins of attraction around them. Historically, models like Hopfield networks use an explicit energy function to guarantee these stable attractors. We broaden this perspective by leveraging that energy is not strictly necessary, as basins of attraction can also be formed via conditional likelihood maximization. This usage of conditional dynamics enables a co-existence of factual recall, where the UDDM can recognize unseen test sequences as fixed points and recover their original tokens given their partially corrupted version, alongside the capability of synthesizing novel sentences. We show that, as the training dataset size increases, basins around training data points shrink while basins around unseen test data points expand, eventually becoming indistinguishable from one another. This memorization-to-generalization transition can be also detected also using the conditional entropy of predicted tokens, which vanish in the memorization regime.
Submission Number: 48
Loading