MFMformer: Multi-resolution Mixture-of-Experts gating for Time Series Forecasting

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series, Time Forecasting, Transformer, Frequency
Abstract: Current time series forecasting architectures mainly rely on single unified solutions that lack specializations, limiting their ability to adapt to different temporal dependencies within the same model. These approaches struggle to efficiently capture the heterogeneous nature of time series data, where different subsequences may require distinct modelings. To address these challenges, we propose MFMformer: Multi-resolution Mixture-of-Experts gating for Time Series Forecasting that combines multi-scale temporal processing with MoE layers. MFMformer introduces two key innovations: (i) an overlapping multi-resolution decomposition mechanism that splits input sequences into 50% overlapping chunks across multiple temporal scales, with instance normalization applied independently to each scale, inspired by short-term Fourier transformation; (ii) Mixture-of-Experts gating that uses the top-3 dominant frequencies from FFT analysis to route inputs between 2 specialized expert networks, enhancing both representational capacity and computational efficiency. Extensive benchmarks on long-term and short-term time series datasets show that MFMformer shows state-of-the-art results that are comparable to existing methods.
Primary Area: learning on time series and dynamical systems
Submission Number: 24040
Loading