Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Yucong Luo; Yitong Zhou; Mingyue Cheng; Jiahao Wang; Daoyu Wang

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time Series Forcasting, Large Language Model, Reinfocement Learning

TL;DR: We propose Time-R1, a time series forecasting framework leveraging slow-thinking LLMs with multi-step reasoning and reinforcement fine-tuning to improve forecast accuracy.

Abstract: To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm - relying on pattern recognition and trend prediction as their core modeling philosophy, lacking an explicit "thinking process" that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., ChatGPT-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations—including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow-thinking capabilities and acquire strong time series reasoning skills. To acquire such slow thinking reasoning capabilities, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we introduce GRIP (group-based relative importance for policy optimization), which utilizes non-uniform sampling along with a fine-grained multi-objective reward specifically designed for time series forecasting to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets. Source code is available https://anonymous.4open.science/r/Time-R1-NeurIPS-2025/.

Supplementary Material: zip

Primary Area: learning on time series and dynamical systems

Submission Number: 12694

Loading