Forecasting-Conditioned Reinforcement Learning: Embedding Forecastability as an Inductive Bias

ICLR 2026 Conference Submission11696 Authors

18 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Forecastability, Traffic Signal Control
TL;DR: FoRL trains RL agents to forecast their own actions, improving predictability and enabling applications like GLOSA
Abstract: We introduce Forecasting-Conditioned Reinforcement Learning (FoRL), an extension to model-free Reinforcement Learning (RL) agents that augments the policy with multi-step self-forecasts. FoRL is trained either via Reward Conditioning (RC), which rewards forecast--action consistency, or Loss Conditioning (LC), which adds an auxiliary forecasting loss. Across discrete and continuous action space benchmarks and forecasting horizons $\(L \in \{2,5,10\}\)$, FoRL consistently improves forecastability---measured by Supervised Action Prediction (SAP) and World-Model Unrolling (WMU)---with minimal sacrifice in environment return. Prior approaches toward predictable RL have typically relied on simplicity-inducing regularizers, shaping policies only indirectly toward more forecastable behaviors or open-loop temporal abstraction such as action chunking. In contrast, FoRL makes predictability an explicit training signal by embedding forecasting directly into the learning problem. Compared to such entropy-based methods, FoRL achieves a superior accuracy--return trade-off and provides direct internal forecasts for potential downstream applications. A case study on Traffic Signal Control (TSC) illustrates how FoRL-generated Internal Forecasts (IF) can support downstream application tasks such as vehicle-side Green Light Optimized Speed Advisory (GLOSA). Moreover, the integrated forecastability design enables effective fine-tuning when forecasts themselves alter the environment dynamics. Overall, FoRL elevates predictability from a post-hoc diagnostic to a first-class inductive bias for RL.
Primary Area: reinforcement learning
Submission Number: 11696
Loading