Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams; Arjun Ashok; Étienne Marcotte; Valentina Zantedeschi; Jithendaraa Subramanian; Roland Riachi; James Requeima; Alexandre Lacoste; Irina Rish; Nicolas Chapados; Alexandre Drouin

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams, Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Jithendaraa Subramanian, Roland Riachi, James Requeima, Alexandre Lacoste, Irina Rish, Nicolas Chapados, Alexandre Drouin

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A forecasting benchmark with problems that require the combined use of numerical historical data and textual context.

Abstract: Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to convey the complete context for reliable and accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge and constraints, which can efficiently be communicated through natural language. However, in spite of recent progress with LLM-based forecasters, their ability to effectively integrate this textual information remains an open question. To address this, we introduce "Context is Key" (CiK), a time-series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities; crucially, every task in CiK requires understanding textual context to be solved successfully. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. This benchmark aims to advance multimodal forecasting by promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0.

Lay Summary: Accurate time series forecasts in the real world rely on more than just historical numerical records. Time series are often driven by external events, or their nature gives them unique behaviours; forecasts can benefit from this additional context. However, most evaluation benchmarks for time series forecasting do not provide this kind of useful information. We built a benchmark of time series forecasting tasks that contain historical numerical data with accompanying side-information (in the form of text) that contains crucial context for making accurate forecasts. Examples include information of an upcoming event that will influence future values (e.g., a planned outage for an ATM that will force the number of withdrawals to 0), or knowledge of a constraint that applies to the forecast (e.g. if the time series to forecast is road occupancy, then it cannot be less than 0). The benchmark, named CiK (Context is Key), contains tasks that span several domains, with varied types of context. We evaluate a range of approaches on CiK, demonstrating surprising performance when using LLMs, also revealing some of their critical shortcomings. Having forecasting methods that can process such additional context unlocks many use cases and directions for research. For example, such forecasting methods can provide an intuitive interface for those without significant modelling expertise to improve their forecasts with additional information. This benchmark will facilitate research into improved context-aided forecasting methods, enabling researchers to test how well their methods fare.

Link To Code: https://github.com/ServiceNow/context-is-key-forecasting

Primary Area: Applications->Time Series

Keywords: time series, forecasting, multimodality, foundation models, contextual forecasting, deep learning, machine learning, natural language processing

Submission Number: 5016

Loading