Interpretability for Time Series Transformers using A Concept Bottleneck Framework

TMLR Paper4752 Authors

28 Apr 2025 (modified: 03 Jul 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: There has been a recent push of research on Transformer-based models for long-term time series forecasting, but the interpretability of these models remains largely unexplored. To address this gap, we develop a framework based on Concept Bottleneck Models. We modify the training objective to encourage a model to develop representations similar to predefined interpretable concepts using Centered Kernel Alignment. We apply the framework to the Vanilla Transformer and Autoformer, and present an in-depth analysis on synthetic data and on a variety of benchmark datasets. We find that the model performance remains mostly unaffected, while the model shows much improved interpretability. Additionally, interpretable concepts become local, which makes the trained model easily intervenable. We demonstrate this with an intervention after applying a time shift to the data.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We included the reviewers' requested changes (written in blue). The main changes are: - Executed an extra experiment on training the Autoformer without access to timestamps as input (Appendix M) - Executed experiments on another architecture: FEDformer (Appendix J) - Improved the readability and clarity of the paper (in particular regarding the intervention experiment and design of the bottleneck framework) For more information, we refer to our responses to the reviewers.
Assigned Action Editor: ~Devendra_Singh_Dhami1
Submission Number: 4752
Loading