Can Language Models Infer Event Descriptions from Time Series?

ACL ARR 2025 February Submission550 Authors

09 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Time series data measure how environments change over time and drive decision-making in critical domains like finance and healthcare. When analyzing time series, we often seek to understand the underlying events occurring in the measured environment. For example, one might ask: *"What corporate announcement may have caused a sharply dropping stock price?"* Events are aptly described with language, so we conduct the first study of whether Large Language Models (LLMs) can infer natural language events from time series. We curate a new benchmark featuring win probabilities collected from 4,200 basketball and American football games, featuring 1.7M timesteps with corresponding events. Building on the recent wave of using LLMs for time series, we extensively evaluate 16 LLMs, finding they demonstrate promising abilities to infer events from time series. An open-weights model, DeepSeek-R1 32B, beats proprietary models like GPT-4o. Despite this impressive initial performance, we also find clear avenues to improve recent models, as we identify failures when altering the provided context, event sequence lengths, and evaluation strategy. All resources needed to reproduce our work are available: https://anonymous.4open.science/r/reason_events-9861/
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, Language Modeling, NLP Applications, Multimodality and Language Grounding to Vision, Robotics and Beyond, Time Series
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 550
Loading