Robust Policy Optimization with Evolutionary Techniques

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Reinforcement Learning, Evolutionary methods, Policy Adaptation, robustness
TL;DR: An algorithm to adapt deep RL models to more complex versions of the environment they were trained on; this algorithm is theoretically and experimentally shown to converge to an optimal policy.
Abstract: Learning-based techniques to train control policies of autonomous agents often assume that the agent experiences are sampled according to a specific dynamical model for the environment. However, environmental dynamics can change, due to intentional or unintended environmental changes. While domain randomization and robust learning can handle some distribution shifts, significant environmental shifts may necessitate re-training to learn policies optimal in the changed environment. We present an algorithm called `Evolutionary Robust Policy Optimization' (ERPO) inspired by evolutionary game theory (EGT) to address the problem of incrementally and efficiently adapting policies to an altered environment. We give theoretical guarantees on the convergence of our algorithm to the optimal policy under the assumption of sparse rewards. We empirically demonstrate that our algorithm outperforms several state-of-the-art deep RL algorithms in many gym environments. Specifically, we are able to adapt policies using fewer training steps while getting higher rewards and requiring lower overall computation time.
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7919
Loading