Stream: Scaling up Mechanistic Interpretability to Long Context in LLMs via Sparse Attention

J Rosser; Jose Luis Redondo Garcia; Gustavo Penha; Konstantina Palla; Hugues Bouchard

Stream: Scaling up Mechanistic Interpretability to Long Context in LLMs via Sparse Attention

J Rosser, Jose Luis Redondo Garcia, Gustavo Penha, Konstantina Palla, Hugues Bouchard

Published: 30 Sept 2025, Last Modified: 26 Nov 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretability tooling and software, Chain of Thought/Reasoning models, AI Safety

Other Keywords: Long Context, LLM, scalable, scaling

TL;DR: A new scalable tool for Mechanistic Interpretability in long context LLMs.

Abstract: As Large Language Models (LLMs) scale to million-token contexts, traditional Mechanistic Interpretability techniques for analyzing attention scale quadratically with context length, demanding terabytes of memory beyond 100,000 tokens. We introduce Sparse Tracing, a novel technique that leverages dynamic sparse attention to efficiently analyze long context attention patterns. We present Stream, a compilable hierarchical pruning algorithm that estimates per-head sparse attention masks in near-linear time complexity $O(T log T)$ and linear space complexity $O(T)$, enabling one-pass interpretability at scale. Stream performs a binary-search-style refinement to retain only the top-$k$ key blocks per query while preserving the model's next-token behavior. By tuning block size and $k$, practitioners can finely control the resolution (e.g. sentence level vs paragraph level) and amount of pruning. We apply Stream to long chain-of-thought reasoning traces and identify thought anchors while pruning 97-99\% of token interactions. On the RULER needle-in-a-haystack benchmark, Stream preserves the critical retrieval paths while discarding 90-96\% of interactions and exposes layer-wise routes from the needle to output. Our method offers a practical drop-in tool for analyzing attention patterns, computing salience scores, and tracing information flow without terabytes of caches. By making long-context interpretability feasible on consumer GPUs, Sparse Tracing helps democratize chain-of-thought monitoring. Code is available at \url{https://github.com/spotify-research/stream-mechinterp/}.

Submission Number: 11

Loading