everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Despite the widespread use of large language models, we still lack unified notation for thinking about and describing their representational spaces. This limits our ability to understand how they work. Ideally we would understand how their representations are structured, how that structure emerges over training, and what kinds of structures are desirable. Unfortunately we as humans tend not to have strong intuitions about high-dimensional vector spaces. Here we propose an information theoretic approach to quantifying structure in deep-learning models. We introduce a novel method for estimating the entropy of vector spaces, and use it to quantify the amount of information in the model we can explain with a set of labels. This can show when regularities emerge in representation space with respect to token, bigram, and trigram information in the input. As these models are learning from human language data, we formalise this in terms of 3 linguistically derived quantities: regularity, variation, and disentanglement. These show how larger models become proportionally more disentangled. We also are able to predict downstream task performance on GLUE benchmarks based on representational structure at the end of pre-training but before fine tuning.