TimeSAE: Mechanistic Interpretability for Time-Series Foundation Models

Published: 01 Mar 2026, Last Modified: 11 Apr 2026ICLR 2026 TSALM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Presentation Attendance: No, we cannot present in-person
Keywords: Mechanistic Interpretability, Sparse Autoencoders (SAE), Activation Steering, Time-Series Foundation Models
Abstract: Time-Series Foundation Models (TSFMs) such as MOMENT-1-large have revolutionized forecasting but incur a ``transparency debt,'' functioning as opaque black boxes where standard attribution fails. TimeSAE resolves this by rigorously adapting Sparse Autoencoders (SAEs) to the continuous domain. The framework decomposes dense residual stream activations into interpretable, monosemantic features, achieving a 99.9% sparsity ratio with high reconstruction fidelity (R^2=0.79). Unlike prior methods that rely on passive correlation, TimeSAE validates interpretability through causal intervention. Latent activation steering reveals that specific features function as orthogonal control knobs, inducing predictable, linear shifts in downstream forecasts. These results confirm the Linear Representation Hypothesis for time-series, demonstrating that complex physical dynamics can be disentangled into atomic signals. By bridging the gap between high-performance forecasting and mechanistic auditability, this framework transforms black-box models into reliable systems for safety-critical applications.
Track: Research Track (max 4 pages)
Submission Number: 103
Loading