Keywords: decomposition, transformer, neural-symbolic, n-grams, interpretability, controllability
TL;DR: We introduce jet expansions: operators that "cuts through" LLM entaglement, separating parts of computation of interest and enabling systematic model inspection like n-gram tables
Abstract: Large language models are becoming general knowledge engines for diverse applications. However, their computations are deeply entangled after training, resisting modularization which complicates interpretability, auditing, and long-term maintenance. We introduce Jet Expansions, a framework for expanding computational graphs using jet operators that generalize truncated Taylor series. Our method systematically decomposes language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a principled, knife-like operator for cutting through entanglement in LLMs, enabling scalable model inspection. We demonstrate how Jet Expansions ground and subsume the popular interpretability technique Logit Lens, reveal a (super-)exponential path structure with respect to recursive residual depth, and support several interpretability applications, including sketching a transformer language model with $n$-gram statistics extracted from its computations and indexing model toxicity levels *without* curated benchmarks.
Supplementary Material: pdf
Primary Area: interpretability and explainable AI
Submission Number: 25587
Loading