Language Diffusion Models are Associative Memories
Keywords: associative memory, discrete diffusion models, language modeling, memorization, generalization
TL;DR: As the training dataset size increases, we observe the emergence of a significant entropy gap, where the conditional entropy of most tokens no longer vanishes. This entropy gap corresponds to the Discrete Diffusion Models' generalization regime.
Abstract: Hopfield networks are energy-based Associative Memory (AM) models, designed for storing and retrieving their training data point as ***local minima***, or memories, of their energy function. Although commonly studied in the image domain, they have not yet been thoroughly investigated in the language modeling tasks. In this work, we demonstrate that Uniform-based Discrete Diffusion Models (UDDMs) computationally behave as AMs, by relying on aspects of conditional likelihood instead of their energy function. Specifically, by analyzing the model's token recovery capabilities, we identify a distinct memorization-to-generalization transition governed by the size of the training dataset. The low-data regime, where UDDMs exhibit near-perfect token recovery, is characterized by vanishing conditional entropy of the token probability, as expected from a well-designed AM network. As the training dataset size increases, we observe the emergence of a significant \textit{entropy gap}, where the conditional entropy of most tokens no longer vanishes. This entropy gap corresponds to the UDDMs' generalization regime.
Submission Number: 37
Loading