Keywords: Time Series, Multimodal Large Language Models
TL;DR: This work introduces ThinkTime, the first time series multimodal LLM with interleaved CoT and tool calls, trained via a two-stage pipeline, which significantly improves time series reasoning while preserving alignment between time series and text.
Abstract: Understanding and reasoning with time series is an important yet unsolved challenge for multimodal large language models (MLLMs). Current time series MLLMs (TS-MLLMs) often struggle with complex tasks due to their overly simplified reasoning process. In this work, we argue that deep thinking is essential for comprehensively understanding and effectively reasoning over time series. We present ThinkTime, the first TS-MLLM that supports Interleaved Time series Chain-of-Thought (iTCoT) with integrated tool calls. In iTCoT, the reasoning process is interleaved with tool calls, allowing the model to dynamically incorporate information from time series slices into its thought process. To enable comprehensive analysis, the model introduces two fundamental operations, slice and compare, which are designed to observe detailed and correlation features. To achieve this, we design a two-stage training process and propose a task-specific training data construction method based on synthetic data. In the supervised fine-tuning stage, we use an iTCoT dataset to teach the model how to integrate tool responses with reasoning processes. Then, in the reinforcement learning stage, we implement an RL training framework for TS-MLLMs that supports iTCoT, improving the model's reasoning and tool-use abilities. Experiments conducted on a wide range of real-world time series demonstrate that ThinkTime achieves substantial improvements in reasoning tasks while maintaining high alignment between time series and text descriptions.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 18256
Loading