Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language.
Keywords: (Cognitive/Neuroscience) Language, Structured Prediction, (Application) Natural Language and Text Processing
Abstract: Predicting upcoming events is critical to our ability to effectively interact with our
environment and conspecifics. In natural language processing, transformer models,
which are trained on next-word prediction, appear to construct a general-purpose
representation of language that can support diverse downstream tasks. However, we
still lack an understanding of how a predictive objective shapes such representations.
Inspired by recent work in vision neuroscience Hénaff et al. (2019), here we test a
hypothesis about predictive representations of autoregressive transformer models.
In particular, we test whether the neural trajectory of a sequence of words in a
sentence becomes progressively more straight as it passes through the layers of the
network. The key insight behind this hypothesis is that straighter trajectories should
facilitate prediction via linear extrapolation. We quantify straightness using a 1-
dimensional curvature metric, and present four findings in support of the trajectory
straightening hypothesis: i) In trained models, the curvature progressively decreases
from the first to the middle layers of the network. ii) Models that perform better on
the next-word prediction objective, including larger models and models trained on
larger datasets, exhibit greater decreases in curvature, suggesting that this improved
ability to straighten sentence neural trajectories may be the underlying driver of
better language modeling performance. iii) Given the same linguistic context, the
sequences that are generated by the model have lower curvature than the ground
truth (the actual continuations observed in a language corpus), suggesting that
the model favors straighter trajectories for making predictions. iv) A consistent
relationship holds between the average curvature and the average surprisal of
sentences in the middle layers of models, such that sentences with straighter neural
trajectories also have lower surprisal. Importantly, untrained models don’t exhibit
these behaviors. In tandem, these results support the trajectory straightening
hypothesis and provide a possible mechanism for how the geometry of the internal
representations of autoregressive models supports next word prediction.
Supplementary Material: pdf
Submission Number: 7431
Loading