Robust Reinforcement Learning via Adversarial Training with Langevin DynamicsDownload PDF

Sep 25, 2019 (edited Dec 24, 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: deep reinforcement learning, robust reinforcement learning, min-max problem
  • Abstract: We re-think the Two-Player Reinforcement Learning (RL) as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics, we propose a new two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our new algorithm consistently outperforms existing baselines, in terms of generalization across differing training and testing conditions, on several MuJoCo environments.
  • Code:
11 Replies