TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts

Xiaowen Ma; Shuning Ge; Fan Yang; Xiangyu Li; Chen yun; Mengting Ma; Wei Zhang; Zhipeng Liu

TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts

Xiaowen Ma, Shuning Ge, Fan Yang, Xiangyu Li, Chen yun, Mengting Ma, Wei Zhang, Zhipeng Liu

01 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time-Series, Mix of Experts, Lag Effects

Abstract: Transformer-based architectures dominate time series modeling by enabling global attention over all timestamps, yet their rigid “one-size-fits-all” context aggregation fails to address two critical challenges in real-world data: (1) inherent lag effects, where the relevance of historical timestamps to a query varies dynamically; (2) anomalous segments, which introduce noisy signals that degrade forecasting accuracy. To resolve these problems, we propose the Temporal Mix of Experts (TMOE)—a novel attention-level mechanism that reimagines key-value (K-V) pairs as local experts (each specialized in a distinct temporal context) and performs adaptive expert selection for each query via localized filtering of irrelevant timestamps. Complementing this local adaptation, a shared global expert preserves the Transformer’s strength in capturing long-range dependencies. We then replace the vanilla attention mechanism in popular time-series Transformer frameworks (i.e., PatchTST and Timer) with TMOE, without extra structural modifications, yielding our specific version TimeExpert and general version TimeExpert-G. Extensive experiments on seven real-world long-term forecasting benchmarks demonstrate that TimeExpert and TimeExpert-G outperform state-of-the-art methods. Code will be released after acceptance.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 382

Loading