Percept Activation Graph (PAG): Decomposing LLM Computation into Perceptual Entities and Their Interactions

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, Cognitive Neuroscience
Abstract: Understanding the computations performed by large-scale neural network models remains an important challenge. Recent work has motivated holistic approaches that focus on population-level dynamics of neurons in these networks, suggesting that these dynamics reflect statistical regularities in data and that the human perceptual tendency for chunking can be leveraged to identify recurring cognitive entities. We extend this line of work by introducing new techniques, inspired by cognitive science and neuroscience, to analyze LLM computations. We formalize chunking in neural data through the perceiving function, which maps recurring high-dimensional activities into a dictionary of recognizable entities. Building on this definition, we decompose neural activations in large-scale networks into a finite set of chunks and find that model activations exhibit compressible regularities across both tokens and layers. Based on these chunks, we define the Percept Activation Graph (PAG) which captures the causal structure of chunks across layers. We apply this analysis to LLMs to examine how they represent compositionality in context, analyzing layer-wise activations during in-context learning on the SCAN meta-learning dataset. Within the PAG, we identify distinct components that encode primitives, and demonstrate that perturbing these components predictably alters the model’s compositional generalization behavior. Our method provides a pathway to automatically extract structured relations between chunks that causally and controllably influence the computation of large-scale neural networks.
Primary Area: interpretability and explainable AI
Submission Number: 10241
Loading