Track: long paper (up to 5 pages)
Keywords: interpretability, memorization, knowledge tracing
TL;DR: This paper investigates how Large Language Models use linear associative memories to store and retrieve facts, revealing that subject-token activations show higher crosstalk, indicating less orthogonality during factual recall.
Abstract: Large Language Models (LLMs) exhibit remarkable capacities to store and retrieve factual knowledge, yet the precise mechanisms by which they encode and recall this information remain under debate. Two main frameworks have been proposed to explain memory storage within transformer feed-forward layers: (1) a key-value memory view, and (2) linear associative memories view. In this paper, we investigate the extent to which the second MLP matrix in LLMs behaves as a linear associative memory (LAM). By measuring pairwise angles between input activation vectors that represent key-vectors in the LAM model, we find that the second MLP matrix exhibits relatively higher orthogonality and minimal cross-talk, supporting the LAM interpretation for generic retrieval. However, we also discover that subject-token representations used in factual recall are significantly less orthogonal, indicating greater interference and entanglement. This implies that editing factual “memories” within these matrices may trigger unintended side effects in other related knowledge. Our results highlight both the promise and the pitfalls of viewing feed-forward layers as linear associative memories, underscoring the need for careful strategies when modifying factual representations in LLMs.
Submission Number: 9
Loading