Robust Reinforcement Learning via Adversarial Training with  Langevin Dynamics

Huang Yu-Ting; Parameswaran Kamalaruban; Paul Rolland; Ya-Ping Hsieh; Volkan Cevher

Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics

Huang Yu-Ting, Parameswaran Kamalaruban, Paul Rolland, Ya-Ping Hsieh, Volkan Cevher

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: deep reinforcement learning, robust reinforcement learning, min-max problem

Abstract: We re-think the Two-Player Reinforcement Learning (RL) as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics, we propose a new two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our new algorithm consistently outperforms existing baselines, in terms of generalization across differing training and testing conditions, on several MuJoCo environments.

Code: https://anonymous.4open.science/r/658167da-96b7-4689-8dd9-ca3dcaf19dd1/

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/robust-reinforcement-learning-via-adversarial/code)

Original Pdf: pdf

11 Replies

Loading