Non-Parameterized Randomization for Environmental Generalization in Deep Reinforcement Learning

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, zero-shot generalization, environmental generalization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This paper models and analyzes the environmental generalization tasks with intrinsic differences in RL for the first time and gives a method to try to solve the problem.
Abstract: The generalization problem presents a major obstacle to the practical application of reinforcement learning (RL) in real-world scenarios, primarily due to the prohibitively high cost of retraining policies. The environmental generalization, which involves the ability to generalize RL agents to different environments with distinct generative models but the same task semantics, remains an unsolved challenge that directly affects real-world deployment. In this paper, we build a structured mathematical framework to describe environmental generalization and show that the difficulty comes from a non-optimizable gap without learning in all environments. Accordingly, we propose a kind of non-parameterized randomization method to augment the training environments. We theoretically demonstrate that training in these environments will give an approximately optimizable lower bound for this gap. Through empirical evaluation, we demonstrate the effectiveness of our method in zero-shot environmental generalization tasks spanning a wide range of diverse environments. Comparisons with existing advanced methods designed for generalization tasks demonstrate that our method has significant superiority in these challenging tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1843
Loading