Interpretability for Time Series Transformers using A Concept Bottleneck Framework

Published: 30 Sept 2025, Last Modified: 30 Sept 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Steering, Causal interventions, Understanding high-level properties of models
Other Keywords: Time series
TL;DR: Training framework for time series transformers to represent interpretable concepts
Abstract: Mechanistic interpretability focuses on *reverse engineering* the internal mechanisms learned by neural networks. We extend our focus and propose to mechanistically *forward engineer* using our framework based on Concept Bottleneck Models. In the context of long-term time series forecasting, we modify the training objective to encourage a model to develop representations which are similar to predefined, interpretable concepts using Centered Kernel Alignment. This steers the bottleneck components to learn the predefined concepts, while allowing other components to learn other, undefined concepts. We apply the framework to the Vanilla Transformer, Autoformer and FEDformer, and present an in-depth analysis on synthetic data and on a variety of benchmark datasets. We find that the model performance remains mostly unaffected, while the model shows much improved interpretability. Additionally, we verify the interpretation of the bottleneck components with an intervention experiment using activation patching.
Submission Number: 124
Loading