Geometric Signatures of Compositionality Across a Language Model’s Lifetime

Published: 23 Oct 2024, Last Modified: 24 Feb 2025NeurReps 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: compositionality, geometry, intrinsic dimension, language models
TL;DR: We use intrinsic dimensionality to analyze the compositionality of representations in language models.
Abstract: Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in a number of compositional generalization tasks. However, much remains to be understood about the computational mechanisms underlying these abilities. We take a geometric approach to this problem by relating the degree of compositionality in data to the intrinsic dimensionality of their representations under an LM, a measure of feature complexity. We show that the degree of dataset compositionality is reflected in representations’ intrinsic dimensionality, and that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Overall, our results highlight that linear and nonlinear dimensionality measures capture different and complementary views of data complexity.
Submission Number: 61
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview