Online Robust Reinforcement Learning with General Function Approximation

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Reinforcement learning (RL) in real-world tasks often suffers from the performance degradation due to the distribution shift between training and deployment environments. Distributionally Robust RL (DR-RL) addresses this issue by optimizing the worst-case performance over an uncertainty set of transition dynamics, providing an optimized baseline performance upon deployment. However, existing methods typically require strong data access assumptions (e.g., a generative model or comprehensive offline datasets) and mostly focus on tabular settings. In this paper, we introduce a purely online DR-RL algorithm with general function approximation that learns a robust policy directly from interaction, without any prior knowledge or pre-collected data. Our method uses a dual-based fitted robust Bellman update to jointly learn the value function and the robust backup operator. We establish the first regret guarantee for online DR-RL in terms of an intrinsic complexity measure—the robust Bellman–Eluder (BE) dimension, for general $\phi$-divergence uncertainty sets. Our regret bound is sublinear and independent of $|\mathcal{S}|$ and $|\mathcal{A}|$, and recovers sharp rates in structured regimes, providing a scalable method for practical DR-RL.
Lay Summary: Many real-world AI systems are trained in environments that differ from the conditions they later face in practice. For example, a self-driving car trained in clear weather may behave unpredictably during rain or snow. Existing reinforcement learning methods often struggle under such changes because they assume the environment remains unchanged after training. In this work, we develop a new learning method that helps AI systems remain reliable even when the environment changes unexpectedly. Instead of learning only from ideal conditions, our method trains the agent to prepare for difficult or worst-case situations while still learning directly through interaction with the environment. A major challenge is that modern AI systems operate in extremely large and complex environments, where traditional robust methods become computationally impractical. Our approach overcomes this limitation by combining robustness with scalable function approximation techniques commonly used in modern deep learning. We also provide mathematical guarantees showing that our method can learn efficiently without requiring unrealistic assumptions or massive pre-collected datasets. Experiments on control tasks further demonstrate improved robustness under different environmental perturbations.
Primary Area: Reinforcement Learning
Keywords: robust reinforcement learning, function approximation
Originally Submitted PDF: pdf
Submission Number: 8667
Loading