Information Structure in Large Language Models

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Interpretability, Information Theory, Language
TL;DR: Information Structure allows us to describe what training, looks like in LLMs, and what representational structures from pre-training will generalise best after post-training.
Abstract:

Despite the widespread use of large language models, we still lack unified notation for thinking about and describing their representational spaces. This limits our ability to understand how they work. Ideally we would understand how their representations are structured, how that structure emerges over training, and what kinds of structures are desirable. Unfortunately we as humans tend not to have strong intuitions about high-dimensional vector spaces. Here we propose an information theoretic approach to quantifying structure in deep-learning models. We introduce a novel method for estimating the entropy of vector spaces, and use it to quantify the amount of information in the model we can explain with a set of labels. This can show when regularities emerge in representation space with respect to token, bigram, and trigram information in the input. As these models are learning from human language data, we formalise this in terms of 3 linguistically derived quantities: regularity, variation, and disentanglement. These show how larger models become proportionally more disentangled. We also are able to predict downstream task performance on GLUE benchmarks based on representational structure at the end of pre-training but before fine tuning.

Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11748
Loading