ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset

Yilin wang; Peixuan Lei; Jie Song; Yuzhe Hao; Tao Chen; Yuxuan Zhang; LEI JIA; Yuanxiang Li; zhongyu wei

ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset

Yilin wang, Peixuan Lei, Jie Song, Yuzhe Hao, Tao Chen, Yuxuan Zhang, LEI JIA, Yuanxiang Li, zhongyu wei

Published: 01 May 2025, Last Modified: 15 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Bridging time-series data and natural language, we propose ITFormer and introduce EngineMT-QA, enabling efficient and accurate Time-Series Question Answering for multimodal AI

Abstract: Time-series data are critical in diverse applications, such as industrial monitoring, medical diagnostics, and climate research. However, effectively integrating these high-dimensional temporal signals with natural language for dynamic, interactive tasks remains a significant challenge. To address this, we introduce the Time-Series Question Answering (Time-Series QA) task and release EngineMT-QA, the first large-scale, multi-task, temporal-textual QA dataset designed to capture complex interactions between time-series signals and natural language. Building on this resource, we propose the Instruct Time Transformer (ITFormer), a novel framework that bridges time-series encoders with frozen large language models (LLMs). ITFormer effectively extracts, aligns, and fuses temporal and textual features, achieving a strong improvement in QA accuracy over strong baselines with fewer than 1\% additional trainable parameters. By combining computational efficiency with robust cross-modal modeling, our work establishes a adaptable paradigm for integrating temporal data with natural language, paving the way for new research and applications in multi-modal AI. More details about the project, including datasets and code, are available at: https://pandalin98.github.io/itformer_site/.

Lay Summary: Many systems we rely on—such as aircraft engines or medical monitors—generate large amounts of sensor data over time. Interpreting this data quickly and accurately is crucial for safety, maintenance, and decision-making. However, it's difficult for humans to make sense of these complex signals directly, and even today’s most advanced AI models struggle to answer questions about them. Our work introduces a new approach that helps AI models understand and respond to natural language questions about time-series data. For example, a mechanic might ask, "Does this engine signal show signs of failure?" or "What actions should I take based on recent data?" To make this possible, we created a new dataset based on real aircraft engine data, containing over 110,000 question-answer examples. This dataset captures different types of questions, such as understanding patterns, diagnosing faults, predicting risks, and making operational decisions. We also developed a new AI system called ITFormer, which connects time-series data with large language models like ChatGPT. ITFormer learns how to explain sensor data using natural language while using very few extra parameters. It outperforms existing AI models in both accuracy and speed and works well even with limited computing resources. In short, this research makes time-series data more understandable and actionable by humans and machines, with applications in aerospace, healthcare, energy, and beyond.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://papercheck.icml.cc/process.php

Primary Area: Applications->Time Series

Keywords: Time Series Analysis, Time-Series Question Answering, Time-Series-Textual Alignment, Time-Series-Textual Fusion

Flagged For Ethics Review: true

Submission Number: 16325

Loading