Contextualize Me – The Case for Context in Reinforcement Learning

Published: 05 Jun 2023, Last Modified: 05 Jun 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how the framework of cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in cRL requires context information, as in other related areas of partial observability. To empirically validate this in the cRL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on cRL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging - and that naive solutions are not enough to generalize across complex context spaces.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Changes 1 (marked with green) - additional clarifications with regards to related work as well as the addition of Whiteson et al. 2011 - updated wording in response to the feedback - removing parts of the theory Section 2 and 3 in favor of extending the benchmark Section 4 by moving parts of the appendix - updated empirical results in Section 5.4 Changes 2 (marked with magenta) - updated language - inclusion of related work by Koppeja & Whiteson (2009) - removal of continual learning challenge - using RLiable for plotting Changes 3 (marked with blue) - updated claims & contributions - moved related work and highlighted Whiteson et al. 2011 - new baseline experiments in the Appendix - updated insights in 6.3 due to these new experiments Final Revision: - Added a note on limitations for Figure 6 - Edited the abstract slightly to make relationship to partial observability clearer
Assigned Action Editor: ~Adam_M_White1
Submission Number: 920