NSW-EPNEWS: A NEWS-AUGMENTED BENCHMARK FOR ELEC- TRICITY PRICE FORECASTING WITH LLMS

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark, LLM, Time-series forecasting, electricity price, AI, Machine learning
TL;DR: NSW-EPNews benchmarks TS models and LLMs on 175k half-hourly NSW prices (2015–2024) with news; 48-step forecasts show news adds little or harms, and LLMs often underperform or hallucinate—highlighting gaps for high-stakes energy forecasting.
Abstract: Electricity price forecasting is a critical component of modern energy-management systems, yet existing approaches heavily rely on numerical histories and ignore contemporaneous textual signals. We introduce NSW-EPNews, the first benchmark that jointly evaluates time-series models and large language models (LLMs) on real-world electricity-price prediction. The dataset includes over 175,000 half-hourly spot prices from New South Wales, Australia (2015–2024) and curated market-news summaries from WattClarity. We frame the task as 48-step-ahead forecasting, using multimodal input, including lagged prices, vectorized news for classical and state-of-the-arts time-series forecasting models, and prompt-engineered structured contexts for LLMs. Our datasets yields 3.6k multimodal prompt-output pairs for LLM evaluation using specific templates. In our comprehensive benchmarks, we identify that news features yield marginal benefits at best and can even degrade performance across traditional statistical, machine learning, deep learning and state of the art time series forecasting models. This pattern holds for open and closed-source LLMs, including GPT-4o, Gemini 1.5 Pro, Meta-Llama-3-8B-Instruct, Mistral-7B-v0.1 and Qwen-2.5-7B-Instruct. It also leads to frequent hallucinations in some closed-source models, such as fabricated or malformed price sequences. NSW- EPNews provides a rigorous testbed for evaluating grounded numerical reasoning in multimodal settings, and highlights a critical gap between current LLM capabilities and the demands of high- stakes energy forecasting.
Supplementary Material: pdf
Primary Area: datasets and benchmarks
Submission Number: 6921
Loading