Forecasting in Offline Reinforcement Learning for Non-stationary Environments

Suzan Ece Ada; Georg Martius; Emre Ugur; Erhan Oztop

Forecasting in Offline Reinforcement Learning for Non-stationary Environments

Suzan Ece Ada, Georg Martius, Emre Ugur, Erhan Oztop

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Robust Reinforcement learning, Offline Reinforcement Learning, Diffusion Models

TL;DR: We introduce Forecasting in Non-stationary Offline RL (FORL), a novel framework designed to be robust to passive non-stationarities, leveraging diffusion probabilistic models and time-series forecasting foundation models.

Abstract: Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or only consider synthetic perturbations at test time—assumptions that often fail in real-world scenarios characterized by abrupt, time-varying offsets. These offsets can lead to partial observability, causing agents to misperceive their true state and degrade performance. To overcome this challenge, we introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies (i) conditional diffusion-based candidate state generation, trained without presupposing any specific form of future non-stationarity, and (ii) zero-shot time-series foundation models. FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode. Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines. By integrating zero-shot forecasting with the agent’s experience we aim to bridge the gap between offline RL and the complexity of real-world, non-stationary environments.

Supplementary Material: zip

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 23610

Loading