Task-Specific Exploration in Meta-Reinforcement Learning via Task Reconstruction

Task-Specific Exploration in Meta-Reinforcement Learning via Task Reconstruction

TMLR Paper4198 Authors

13 Feb 2025 (modified: 27 Jul 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement learning trains policies specialized for a single task. Meta-reinforcement learning (meta-RL) improves upon this by leveraging prior experience to train policies for few-shot adaptation to new tasks. However, existing meta-RL approaches often struggle to explore and learn tasks effectively. We introduce a novel meta-RL algorithm for learning to learn task-specific, sample-efficient exploration policies. We achieve this through task reconstruction, an original method for learning to identify and collect small but informative datasets from tasks. To leverage these datasets, we propose a meta-learned hyper-reward that encourages policies to learn to adapt. Empirical evaluations demonstrate that our algorithm adapts to a larger variety of tasks and achieves higher returns than existing meta-RL methods. Additionally, we show that even with full task information, adaptation is more challenging than previously assumed. However, policies trained with our hyper-reward adapt to new tasks successfully.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: All the changes have been described in the general comment and in the three comments addressed to each of the reviewers.

Assigned Action Editor: ~Marcello_Restelli1

Submission Number: 4198

Loading