Keywords: Time-Series, Mix of Experts, Lag Effects
Abstract: Transformer-based architectures dominate time series modeling by enabling global attention over all timestamps, yet their rigid “one-size-fits-all” context aggregation fails to address two critical challenges in real-world data: (1) inherent lag effects, where the relevance of historical timestamps to a query varies dynamically; (2) anomalous segments, which introduce noisy signals that degrade forecasting accuracy.
To resolve these problems, we propose the Temporal Mix of Experts (TMOE)—a novel attention-level mechanism that reimagines key-value (K-V) pairs as local experts (each specialized in a distinct temporal context) and performs adaptive expert selection for each query via localized filtering of irrelevant timestamps. Complementing this local adaptation, a shared global expert preserves the Transformer’s strength in capturing long-range dependencies. We then replace the vanilla attention mechanism in popular time-series Transformer frameworks (i.e., PatchTST and Timer) with TMOE, without extra structural modifications, yielding our specific version TimeExpert and general version TimeExpert-G.
Extensive experiments on seven real-world long-term forecasting benchmarks demonstrate that TimeExpert and TimeExpert-G outperform state-of-the-art methods. Code will be released after acceptance.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 382
Loading