Abstract: While Reinforcement Learning ( RL) has made great strides towards solving increasingly
complicated problems, many algorithms are still brittle to even slight environmental changes.
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in
a principled manner, thereby enabling flexible, precise and interpretable task specification
and generation. Our goal is to show how the framework of cRL contributes to improving
zero-shot generalization in RL through meaningful benchmarks and structured reasoning
about generalization tasks. We confirm the insight that optimal behavior in cRL requires
context information, as in other related areas of partial observability. To empirically validate
this in the cRL framework, we provide various context-extended versions of common RL
environments. They are part of the first benchmark library, CARL, designed for generalization
based on cRL extensions of popular benchmarks, which we propose as a testbed to further
study general agents. We show that in the contextual setting, even simple RL environments
become challenging - and that naive solutions are not enough to generalize across complex
context spaces.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Changes 1 (marked with green)
- additional clarifications with regards to related work as well as the addition of Whiteson et al. 2011
- updated wording in response to the feedback
- removing parts of the theory Section 2 and 3 in favor of extending the benchmark Section 4 by moving parts of the appendix
- updated empirical results in Section 5.4
Changes 2 (marked with magenta)
- updated language
- inclusion of related work by Koppeja & Whiteson (2009)
- removal of continual learning challenge
- using RLiable for plotting
Changes 3 (marked with blue)
- updated claims & contributions
- moved related work and highlighted Whiteson et al. 2011
- new baseline experiments in the Appendix
- updated insights in 6.3 due to these new experiments
Final Revision:
- Added a note on limitations for Figure 6
- Edited the abstract slightly to make relationship to partial observability clearer
Code: https://github.com/automl/carl/tree/train
Assigned Action Editor: ~Adam_M_White1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 920
Loading