Robust Reinforcement Learning with Wasserstein Constraint

Linfang Hou; Liang Pang; Xin Hong; Yanyan Lan; Zhiming Ma; Dawei Yin

Robust Reinforcement Learning with Wasserstein Constraint

Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhiming Ma, Dawei Yin

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: Robust Reinforcement Learning aims to find the optimal policy with some degree of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness though disturbing the current state or simulated environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wasserstein distance to measure the disturbance to the reference transition probability. With Wasserstein distance, we are able to connect transition probability disturbance to the state disturbance, and reduces an infinite-dimensional optimization problem to a finite-dimensional risk-aware problem. Through the derived risk-aware optimal Bellman equation, we first show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm—WassersteinRobustAdvantageActor-Critic algorithm (WRA2C). The effectiveness of the proposed algorithm is verified in theCart-Pole environment.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/robust-reinforcement-learning-with/code)

Original Pdf: pdf

7 Replies

Loading