Relaxed Transition Kernels can Cure Underestimation in Adversarial Offline Reinforcement Learning

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Offline reinforcement learning (RL) trains policies from pre-collected data without further environment interaction. However, discrepancies between the dataset and true environment—particularly in the state transition kernel—can degrade policy performance. To simulate environment shifts without being overly conservative, we introduce a relaxed state-adversarial method that perturbs the policy while applying a controlled relaxation mechanism. This method improves robustness by interpolating between nominal and adversarial dynamics. Theoretically, we provide a performance lower bound; empirically, we show improved results across challenging offline RL benchmarks. Our approach integrates easily with existing model-free algorithms and consistently outperforms baselines, especially in high-difficulty domains like Adroit and AntMaze.
Supplementary Material: zip
Submission Number: 49
Loading