Keywords: Mechanistic interpretability, Large language models, Network science, Graph theory, Emergent abilities
Abstract: Understanding scaling laws and emergent abilities in Large Language Models
(LLMs) remains a key challenge for interpretability. While much prior work in
mechanistic interpretability has focused on learned representations, the attention
matrix—which governs information flow—has received much less attention. Fur-
thermore, the analysis of the attention matrix from a theoretical network science
perspective has also not been done. In this work, we present a pipeline for dynamic
graph construction from attention matrices, introduce a novel head aggregation
technique based on entropy, and analyse the attention graphs from a network sci-
ence perspective to draw interpretability insights. Our experiments show that the
entropy-based head aggregation preserves attention details, and that key graph
metrics—specifically the clustering coefficient and maximum pagerank—correlate
with improved model correctness and emergent abilities in LLMs. Notably, our
findings indicate that larger models exhibit higher maximum pagerank and lower
clustering coefficients, suggesting they reason differently by attending more glob-
ally and selectively focusing on key hotspots.
Primary Area: interpretability and explainable AI
Submission Number: 23600
Loading