Keywords: compositionality, geometry, intrinsic dimension, language models
TL;DR: We use intrinsic dimensionality to analyze the compositionality of representations in language models.
Abstract: Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in
a number of compositional generalization tasks. However, much remains to be understood about the computational mechanisms underlying these abilities. We take a geometric approach to this problem by relating the degree of compositionality in data to the intrinsic
dimensionality of their representations under an LM, a measure of feature complexity. We show that the degree of dataset compositionality is reflected in representations’ intrinsic dimensionality, and that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Overall, our results highlight that linear and nonlinear dimensionality measures capture different and complementary views of data complexity.
Submission Number: 61
Loading