Chain-of-thought Reviewing and Correction for Time Series Question Answering

ACL ARR 2026 January Submission2688 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series Question Answering, Large Language Models, Chain-of-Thought, Time Series Analysis
Abstract: With the advancement of large language models (LLMs), diverse time series analysis tasks are reformulated as time series question answering (TSQA) through a unified natural language interface. However, existing LLM-based approaches largely adopt general natural language processing techniques and are prone to reasoning errors when handling complex numerical sequences. Different from purely textual tasks, time series data are inherently verifiable, enabling consistency checking between reasoning steps and the original input. Motivated by this property, we propose T3LLM, which performs multi-step reasoning with an explicit correction mechanism for TSQA. The T3LLM framework consists of three LLMs, namely, a worker, a reviewer, and a student. The worker generates step-wise chains of thought (CoT) under structured prompts, the reviewer identifies erroneous steps and provides corrective comments, and the corrected CoT are used to fine-tune the student model to internalize reasoning and self-correction into its parameters. Experiments on multiple real-world TSQA benchmarks demonstrate that T3LLM achieves state-of-the-art performance over strong LLM-based baselines.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: logical reasoning, multimodal QA, reasoning
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 2688
Loading