Abstract: Reinforcement learning trains policies specialized for a single task. Meta-reinforcement learning (meta-RL) improves upon this by leveraging prior experience to train policies for few-shot adaptation to new tasks. However, existing meta-RL approaches often struggle to explore and learn tasks effectively. We introduce a novel meta-RL algorithm for learning to learn task-specific, sample-efficient exploration policies. We achieve this through task reconstruction, an original method for learning to identify and collect small but informative datasets from tasks. To leverage these datasets, we propose a meta-learned hyper-reward that encourages policies to learn to adapt. Empirical evaluations demonstrate that our algorithm adapts to a larger variety of tasks and achieves higher returns than existing meta-RL methods. Additionally, we show that even with full task information, adaptation is more challenging than previously assumed. However, policies trained with our hyper-reward adapt to new tasks successfully.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: All the changes have been described in the general comment and in the three comments addressed to each of the reviewers.
Assigned Action Editor: ~Marcello_Restelli1
Submission Number: 4198
Loading