Mixture of Experts for Time Series Foundation Models

Published: 10 Oct 2024, Last Modified: 26 Nov 2024NeurIPS 2024 TSALM WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: time series forecasting, foundation model
Abstract: Time series foundation models, such as MOIRAI, have shown exceptional zero-shot forecasting capabilities. However, they enable cross-frequency learning by employing multiple linear projection layers, each specialized for handling time series at a specific frequency. This design has two major limitations: (1) Time series data are imbalanced across frequencies, leading to insufficient training of parameters for underrepresented frequencies and diminishing the effectiveness of cross-frequency learning. (2) Specialization at the frequency level is coarse-grained. For instance, time series with similar patterns but different frequencies can produce undesirable, distinct embeddings. Additionally, time series data from the same frequency can exhibit various patterns and a linear layer lacks the capacity to handle such complexity. To address these issues holistically, this paper proposes MOIRAI-MOE, which uses a single projection layer and delegates the modeling of diverse time series patterns to the mixture of experts (MoE) within Transformers. By leveraging experts for token-level specialization, MOIRAI-MOE achieves superior unified learning capabilities and delivers significant improvements in both in-distribution and zero-shot assessments.
Submission Number: 68
Loading