Keywords: Reinforcement Learning, Transfer Learning, Exploration, Test-Time Adaptation
TL;DR: Identifying the common characteristics in conventional exploration algorithms, and determine which characteristics are most suitable for transferring policies between different MDPs.
Abstract: In reinforcement learning (RL), exploration is used to help policy models learn to
solve individual tasks more efficiently and in increasingly challenging environments.
In many real-world applications of RL, however, environments are non-stationary;
they can change in unanticipated and unanticipatable ways, and there are condi-
tions in which the agent must adapt its policy online, at test time, to the changed
environment. Given that most exploration methods are designed for stationary
MDPs of single tasks, it is not well understood which exploration methods are most
beneficial to efficient online task transfer. Our first contribution is to categorize an
array of exploration methods according to common “characteristics” such as being
designed around the principles of a separate exploration objective or adding noise to
the RL process. We then evaluated eleven exploration algorithms within and across
characteristics on the efficiency of adaptation and transfer in multiple discrete and
continuous domains. Our results show that exploration methods designed around
the principle of explicit diversity and stochasticity most consistently benefit policy
transfer. Additionally, our analysis considers the reasons that some characteristics
correlate with improved performance and efficiency across multiple tasks, while oth-
ers only improve transfer performance with respect to specific tasks. We conclude
by discussing the potential implications for future exploration algorithms to most
efficiently adapt to unexpected test-time environment changes
Submission Number: 2
Loading