Jet Expansions: Restructuring LLM Computation for Model Inspection

Jet Expansions: Restructuring LLM Computation for Model Inspection

ICLR 2026 Conference Submission25587 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: transformer, decomposition, interpretability, neural-symbolic, n-grams, XAI

TL;DR: After training, LLM computations become deeply entangled. For interpretability, we introduce a knife-like operator that cuts through this entanglement, separating the part we care about from the remainder and enabling scalable model inspection.

Abstract: Large language models are becoming general knowledge engines for diverse applications. However, their computations are deeply entangled after training, resisting modularization which complicates interpretability, auditing, and long-term maintenance. We introduce Jet Expansions, a framework for expanding computational graphs using jet operators that generalize truncated Taylor series. Our method systematically decomposes language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a principled, knife-like operator for cutting through entanglement in LLMs, enabling scalable model inspection. We demonstrate how Jet Expansions ground and subsume the popular interpretability technique Logit Lens, reveal a (super-)exponential path structure with respect to recursive residual depth, and support several interpretability applications, including sketching a transformer language model with $n$-gram statistics extracted from its computations and indexing model toxicity levels *without* curated benchmarks.

Supplementary Material: pdf

Primary Area: interpretability and explainable AI

Submission Number: 25587

Loading