Improving Generalization of Meta Reinforcement Learning via Explanation

Shicheng Liu; Minghui Zhu

Improving Generalization of Meta Reinforcement Learning via Explanation

Shicheng Liu, Minghui Zhu

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: explainable meta reinforcement learning; meta reinforcement learning generalization

Abstract: Meta reinforcement learning learns a meta-prior (e.g., meta-policy) from a set of training tasks, such that the learned meta-prior can efficiently adapt to all the tasks in a task distribution. However, it has been observed in literature that the learned meta-prior usually has imbalanced generalization, i.e., it adapts well to some tasks but adapts poorly to some other tasks. This paper aims to explain why certain tasks are poorly adapted and, more importantly, use this explanation to improve generalization. Our methodology has two parts. The first part identifies ``critical" training tasks that are most important to achieve good performance on those poorly-adapted tasks. An explanation of the poor generalization is that the meta-prior does not pay enough attention to the critical training tasks. To improve generalization, the second part formulates a bi-level optimization problem where the upper level learns how to augment the critical training tasks such that the meta-prior can pay more attention to the critical tasks, and the lower level computes the meta-prior distribution corresponding to the current augmentation. We propose an algorithm to solve the bi-level optimization problem and theoretically guarantee that (1) the algorithm converges at the rate of $O(1/\sqrt{K})$, (2) the learned augmentation makes the meta-prior focus more on the critical training tasks, and (3) the generalization improves after the task augmentation. We use two real-world experiments and three MuJoCo experiments to show that our algorithm improves the generalization and outperforms state-of-the-art baselines.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8143

Loading