Task Characteristic and Contrastive Contexts for Improving Generalization in Offline Meta-Reinforcement Learning

ICLR 2025 Conference Submission11750 Authors

27 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Meta-Reinforcement Learning
TL;DR: We propose TCMRL, a framework that improves the generalization in offline meta-RL by capturing both task characteristic and task contrastive information, resulting in generalizable contexts and effective adaptation to unseen target tasks.
Abstract: Context-based offline meta-reinforcement learning (meta-RL) methods typically extract contexts summarizing task information from historical trajectories to achieve adaptation to unseen target tasks. Nevertheless, previous methods may lack generalization and suffer from ineffective adaptation. Our key insight to counteract this issue is that they fail to capture both task characteristic and task contrastive information when generating contexts. In this work, we propose a framework called task characteristic and contrastive contexts for offline meta-RL (TCMRL), which consists of a task characteristic extractor and a task contrastive loss. More specifically, the task characteristic extractor aims at identifying transitions within a trajectory, that are characteristic of a task, when generating contexts. Meanwhile, the task contrastive loss favors the learning of task information that distinguishes tasks from one another by considering interrelations among transitions of trajectory subsequences. Contexts that include both task characteristic and task contrastive information provide a comprehensive understanding of the tasks themselves and implicit relationships among tasks. Experiments in meta-environments show the superiority of TCMRL over previous offline meta-RL methods in generating more generalizable contexts, and achieving efficient and effective adaptation to unseen target tasks.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11750
Loading