Keywords: Multimodal Time Series, Time Series Question Answering
Abstract: Understanding the relationship between textual data and time-series evolution is a critical yet under-explored challenge in applied data science. While multimodal learning has gained traction, existing time-series benchmarks provide limited support for evaluating cross-modal reasoning and complex question answering, both essential for capturing interactions between narrative information and temporal patterns.
To bridge this gap, we introduce Multimodal Time Series Benchmark (MTBench), a large-scale benchmark designed to evaluate large language models (LLMs) on the joint reasoning over time-series and text, exemplified through financial and weather domains. MTBench consists of paired time-series and textual data, including financial analysis with aligned stock price movements and weather reports matched to historical temperature records. Unlike existing benchmarks focused on isolated modalities, MTBench offers a comprehensive testbed for language models to jointly reason over structured numerical trends and unstructured textual narratives.
MTBench supports diverse tasks that require a deep understanding of both text and time-series data, including forecasting, semantic and technical trend analysis, and news-driven question answering (QA). These tasks assess the model’s ability to capture temporal dependencies, extract key insights from text, and integrate cross-modal information.
We benchmark state-of-the-art LLMs on MTBench, providing a systematic analysis of their effectiveness in capturing the causal relationships between textual narratives and temporal patterns. Our findings reveal significant challenges in current models, including difficulty with long-term dependencies, limited causal interpretation in financial and weather dynamics, and insufficient multimodal fusion. MTBench establishes a foundation for advancing multimodal time-series research and for developing the next generation of multimodal models capable of reasoning across narrative and time series data.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 22538
Loading