Zero-Shot Task-Level Adaptation via Coarse-to-Fine Policy Refinement and Holistic-Local Contrastive Representations

Zhengwei Li; Zhenyang Lin; Chen Yurou; Lu Zhang; Zhi-yong Liu

Zero-Shot Task-Level Adaptation via Coarse-to-Fine Policy Refinement and Holistic-Local Contrastive Representations

Zhengwei Li, Zhenyang Lin, Chen Yurou, Lu Zhang, Zhi-yong Liu

28 Sept 2024 (modified: 22 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Meta-RL, Zero-shot Task-level Adaptation, Contrastive Representations

TL;DR: We propose a Coarse-to-Fine Policy Refinement combined with a Holistic-Local Contrastive Representation method, to enable effective zero-shot task-level adaptation.

Abstract: Meta-reinforcement learning offers a mechanism for zero-shot adaptation, enabling agents to handle new tasks with parametric variation in real-world environments. However, existing methods still struggle with task-level adaptation, which demands generalization beyond simple variations within tasks, thereby limiting their practical effectiveness. This limitation stems from several challenges, including the poor task representations and inefficient policy learning, resulting from the underutilization of hierarchical structure inherent in task-level adaptation. To address these challenges, we propose a Coarse-to-Fine Policy Refinement combined with a Holistic-Local Contrastive Representation method to enable effective zero-shot policy adaptation. Specifically, in terms of policy learning, we use task language instructions as prior knowledge to select skill-specific expert modules as a coarse policy. This coarse policy is then refined by a fine policy generated through a hypernetwork, producing a task-aware policy based on task representations. Additionally, for task representation, we employ contrastive learning from both holistic and local perspectives to enhance task representations for more effective policy adaptation. Experimental results demonstrate that our method significantly improves learning efficiency and zero-shot adaptation on new tasks, outperforming previous methods by approximately 42.3% and 45.4% in success rate on the Meta-World ML-10 and ML-45 benchmarks, respectively.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13676

Loading