MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss

MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss

ICLR 2026 Conference Submission3855 Authors

11 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: time series forecasting, loss function

TL;DR: We propose the MMPD loss for patch-based time series forecasting backbones to model complex future distributions, enabling them to generate multiple diverse predictions with corresponding probabilities.

Abstract: Despite the flourishing in time series (TS) forecasting backbones, the training mostly relies on regression losses like Mean Square Error (MSE). However, MSE assumes a one-mode Gaussian distribution, which struggles to capture complex patterns, especially for real-world scenarios where multiple diverse outcomes are possible. We propose the Multi-Mode Patch Diffusion (MMPD) loss, which can be applied to any patch-based backbone that outputs latent tokens for the future. Models trained with MMPD loss generate diverse predictions (modes) with the corresponding probabilities. Technically, MMPD loss models the future distribution with a diffusion model conditioned on latent tokens from the backbone. A lightweight Patch Consistent MLP is introduced as the denoising network to ensure consistency across denoised patches. Multi-mode predictions are generated by a multi-mode inference algorithm that fits an evolving variational Gaussian Mixture Model (GMM) during diffusion. Experiments on eight datasets show its superiority in diverse forecasting. Its deterministic and probabilistic capabilities also match the strong competitor losses, MSE and Student-T, respectively.

Primary Area: learning on time series and dynamical systems

Submission Number: 3855

Loading