NUMBERS AS TEXT: COMPLEMENTARY DUAL- MODALITY EMBEDDINGS FOR TIME SERIES FORECASTING

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series Forecasting, Multi-Modality, LLM
TL;DR: Without using complex prompts, we achieve SOTA time series forecasting by feeding the model two complementray views of the raw data - numerical and textual form - establishing a new approach for applying LLM in this domain.
Abstract: The remarkable success of Large Language Models (LLMs) in language tasks inspires us to explore their application in long-term time series forecasting(LTSF). Their ability to capture complex sequence dependencies from massive datasets suggests a strong potential for modeling the intricate patterns inherent in time series data. However, current methods for applying LLMs to LTSF often rely on hand-engineered statistical features and elaborate, dataset-specific prompts. This approach not only deviates from the end-to-end learning paradigm but also introduces a critical risk of lookahead bias, where performance gains may stem from the model accessing information within its pre-training corpus rather than genuine forecasting ability. A clear gap exists for a method that leverages LLMs on raw time series data in a robust, feature-free manner. To address this gap, we propose a novel framework, NumText, that directly operates on raw time series data. Our method treats the series as a dual-modality input, generating two parallel representations: a direct numerical value embedding and a forecasting-oriented LLM embedding derived from the series' textual form. These distinct embeddings are then combined through a modality-specific Mixture-of-Experts (MoE) to form a rich, unified input for a downstream attention mechanism. Furthermore, we introduce a time-series text embedding cache to reduce computational overhead during inference. Our extensive experiments reveal that the numerical value embeddings and the LLM's textual embeddings are highly complementary, capturing different yet synergistic signals crucial for forecasting. This synergy enables our model to achieve improvement upon current state-of-the-art (SOTA) performance on several benchmark TSF datasets, establishing a more robust approach for applying LLMs in this domain.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 10371
Loading