Dual-Forecaster: A Multimodal Time Series Model Integrating Textual Cues via Dual-Scale Alignment

Dual-Forecaster: A Multimodal Time Series Model Integrating Textual Cues via Dual-Scale Alignment

ICLR 2026 Conference Submission17717 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: time series forecasting, multimodal time series model, cross-modality alignment

Abstract: Time series forecasting plays a vital role for decision-making across a wide range of real-world domains, which has been extensively studied. Most existing single-modal time series models rely solely on numerical series, which suffer from the limitations imposed by insufficient information. Recent studies have revealed that multimodal models can address the core issue by integrating textual information. However, these models primarily employ coarse-grained meta information designed for the whole dataset (\emph{e.g.}, task instruction, domain description, data statistics, etc.), while the use of sample-specific textual contexts remains underexplored. To this end, we propose Dual-Forecaster, a pioneering multimodal time series model that utilizes finer-grained textual information at the sample level through the well-designed dual-scale alignment technique. Specifically, we decouple the learning of semantic and patch-level features, enabling the direct extraction of both global semantic representations critical for cross-modal understanding and local patch features essential for time series forecasting. Our comprehensive evaluations demonstrate that Dual-Forecaster is a distinctly effective multimodal time series model that outperforms or is comparable to other state-of-the-art models, highlighting the superiority of integrating textual information for time series forecasting. This work opens new avenues in the integration of textual information with numerical time series data for multimodal time series analysis.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 17717

Loading