TiMi: Empowering Time Series Transformers with Multimodal Mixture of Experts

TiMi: Empowering Time Series Transformers with Multimodal Mixture of Experts

ICLR 2026 Conference Submission2097 Authors

04 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep learning, machine learning, time series forecasting

Abstract: Multimodal time series forecasting has garnered significant attention for its potential to provide more robust and accurate predictions than traditional single-modality models by leveraging rich information inherent in other modalities. However, due to fundamental challenges in modality alignment, existing methods often struggle to effectively incorporate multimodal data into predictions, particularly textual information that has a causal influence on time series fluctuations, such as emergency reports and policy announcements. In this paper, we reflect on the role of textual information in numerical forecasting and propose **Ti**me series transformers with Multimodal **Mi**xture-of-Experts, **TiMi**, to unleash the causal reasoning capabilities of LLMs. Concretely, TiMi utilizes language models to generate inferences on future developments, which then serve as guidance for time series forecasting. To seamlessly integrate both exogenous factors and time series into predictions, we introduce a Multimodal Mixture-of-Experts (MMoE) module as a lightweight plug-in to empower Transformer-based time series models for multimodal forecasting, eliminating the need for explicit representation-level alignment. Experimentally, our proposed TiMi demonstrates consistent state-of-the-art performance on sixteen real-world multimodal forecasting benchmarks, outperforming advanced unimodal and multimodal baselines while offering strong adaptability and interpretability.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 2097

Loading