Keywords: Large Language Models, Interpretability
TL;DR: We introduce StructLens, an analytical frame- work designed to reveal the holistic relationship among internal structures based on their inter-token relationships within a layer.
Abstract: Language exhibits inherent structures,
a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well.
While interpretability research has investigated the components of language models, existing approaches focus on local inter-token relationships within layers or modules (e.g., Multi-Head Attention), leaving global inter-layer relationships largely overlooked.
To address this gap, we introduce StructLens, an analytical framework designed to reveal how internal structures relate holistically through their inter-token connection within a layer.
StructLens constructs maximum spanning trees based on residual streams, analogous to dependency parsing, and leverages the tree properties to quantify inter-layer distance (or similarity) from a structural perspective.
Our findings demonstrate that StructLens yields an inter-layer similarity pattern that is distinctively different from conventional cosine similarity. Moreover, this structure-aware similarity proves to be beneficial for practical tasks, such as layer pruning, highlighting the effectiveness of structural analysis for understanding and optimizing language models.
Primary Area: interpretability and explainable AI
Submission Number: 23873
Loading