Relaxed State-Adversarial Offline Reinforcement Learning: A Leap Towards Robust Model-Free Policies from Historical Data

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Offline Reinforcement Learning; Robust Reinforcement Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We introduce RAORL, a model-free method emphasizing robustness for Offline RL. It views environments adversarially, ensuring policies are cautious in unknown areas.
Abstract: Offline reinforcement learning (RL) targets the development of top-tier policies from historical data, eliminating the need for environmental interactions. While many prior studies have focused on model-based RL strategies, we present the Relaxed State-Adversarial Offline RL (RAORL), an innovative model-free offline RL solution. RAORL sidesteps model uncertainty issues by framing the problem within a state adversarial context, eliminating the need for explicit environmental modeling. Our method guarantees the policy's robustness and its capability to adapt to varying transition dynamics. Anchored in robust theoretical foundations, RAORL promises performance guarantees and presents a conservative value function that reflects average-case outcomes over an uncertainty set. Empirical evaluations on established offline RL benchmarks indicate that RAORL not only meets but frequently surpasses the performance of state-of-the-art methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1710
Loading