Keywords: Generalization, contextual RL
Abstract: While Reinforcement Learning (RL) has shown successes in a variety of domains, including game playing, robot manipulation and nuclear fusion, modern RL algorithms are not designed with generalization in mind, making them brittle when faced with even slight variations of their environment.
To address this limitation, recent research has increasingly focused on the generalization capabilities of RL agents.
Ideally, general agents should be capable of zero-shot transfer to previously unseen environments and robust to changes in the problem setting while interacting with an environment.
Steps in this direction have been taken by proposing new problem settings where agents can test their transfer performance, e.g.~the Arcade Learning Environment's flavors or benchmarks utilizing Procedural Content Generation (PCG) to increase task variation, e.g. ProcGen, NetHack or Alchemy.
While these extended problem settings in RL have expanded the possibilities for benchmarking agents in diverse environments, the degree of task variation is often either unknown or cannot be controlled precisely.
We believe that generalization in RL is held back by these factors, stemming in part from a lack of problem formalization.
In order to facilitate generalization in RL, contextual RL (cRL) proposes to explicitly take environment characteristics, the so-called context into account.
This inclusion enables precise design of train and test distributions with respect to this context.
Thus, cRL allows us to reason about the generalization capabilities of RL agents and to quantify their generalization performance.
Overall, cRL provides a framework for both theoretical analysis and practical improvements.
In order to empirically study cRL, we introduce our benchmark library CARL, short for Context-Adaptive Reinforcement Learning.
CARL collects well-established environments from the RL community and extends them with the notion of context.
We use our benchmark library to empirically show how different context variations can drastically increase the difficulty of training RL agents, even in simple environments.
We further verify the intuition that allowing RL agents access to context information is beneficial for generalization tasks in theory and practice.
Already Accepted Paper At Another Venue: already accepted somewhere else
1 Reply
Loading