Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

Xu Liu; Juncheng Liu; Gerald Woo; Taha Aksu; Yuxuan Liang; Roger Zimmermann; Chenghao Liu; Junnan Li; Silvio Savarese; Caiming Xiong; Doyen Sahoo

Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Junnan Li, Silvio Savarese, Caiming Xiong, Doyen Sahoo

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Achieving effective unified pretraining on large time series corpora remains an open challenge in developing time series foundation models. Existing methods, such as Moirai, introduce multiple projection layers for time series of different frequencies to account for high data heterogeneity. We identify major drawbacks to this human-imposed frequency-level model specialization. First, frequency is not a reliable indicator for grouping pretraining data. Second, time series can display varied distributions even within a short window. Frequency-level specialization overlooks the diversity at this granularity. To address these issues, this paper introduces Moirai-MoE, excluding human-defined data groupings while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. With this design, Moirai-MoE eliminates reliance on heuristics and enables automatic token-level specialization. Extensive evaluations on 39 datasets demonstrate the superiority of Moirai-MoE over state-of-the-art foundation models. This study also conducts comprehensive model analyses to explore the inner workings of time series MoE foundation models.

Lay Summary: Time series data — like temperature changes, stock prices, or heart rate signals — can vary widely in how fast they change and what patterns they show. Traditional AI models try to handle this variety by grouping the data based on how frequently it changes. However, this approach has serious limitations: data that changes at the same speed can still look very different, and data that changes at different speeds can sometimes follow similar patterns. Our research proposes a new method called Moirai-MoE that avoids these rigid groupings. Instead of relying on human-defined categories, our model uses a technique called a “Mixture of Experts” to let the AI automatically specialize based on the patterns it sees in the data. This allows the model to adapt more flexibly and accurately to the rich variety found in real-world time series. We tested Moirai-MoE on 39 diverse datasets and found it consistently outperformed current top models. This work advances our ability to build general-purpose AI systems that can better understand and learn from complex time-based data.

Primary Area: Applications->Time Series

Keywords: Time Series Foundation Models, Sparse Mixture of Experts

Submission Number: 10674

Loading