Robust Reinforcement Learning via Adversarial Training with Langevin DynamicsDownload PDF

25 Sept 2019 (modified: 23 Mar 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: deep reinforcement learning, robust reinforcement learning, min-max problem
Abstract: We re-think the Two-Player Reinforcement Learning (RL) as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics, we propose a new two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our new algorithm consistently outperforms existing baselines, in terms of generalization across differing training and testing conditions, on several MuJoCo environments.
Code: https://anonymous.4open.science/r/658167da-96b7-4689-8dd9-ca3dcaf19dd1/
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/robust-reinforcement-learning-via-adversarial/code)
Original Pdf: pdf
11 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview