Keywords: Context-based offline meta-reinforcement learning, meta-reinforcement learning, Offline reinforcement learning.
TL;DR: We propose a framework called task characteristic contexts for offline meta-RL (TCMRL), which captures task characteristic information for generalizable contexts and effective adaptation to unseen target tasks.
Abstract: Context-based offline meta-reinforcement learning (meta-RL) methods typically extract contexts summarizing task information from
historical trajectories to achieve adaptation to unseen target tasks. Nevertheless, previous methods are affected by context shift caused by the mismatch between the behavior policy and context-based policy, as well as the distinctness among tasks, leading to poor generalization and limited adaptation. Our key insight is that existing methods overlook the task characteristic information, which
not only reflects task-specific information but also serves to distinguish among tasks, thereby hindering the extraction and utilization of contexts during adaptation. To address this issue, we propose a framework called task characteristic contexts for offline meta-RL
(TCMRL). We consider that such task characteristic information is directly related to task properties, which consist of both reward functions and transition dynamics, and the interrelations among transitions. More specifically, we design a characteristic metric based on context-based reward and state estimators, which utilize task properties to construct the relationships among contexts extracted from entire trajectories. Moreover, we introduce a cyclic interrelation to obtain the interrelations among transitions within sequential subtrajectories from forward, backward and inverse perspectives. Contexts with task characteristic information provide a comprehensive understanding of each task and implicit relationships among them, enabling effective extraction and utilization of contexts during adaptation. Experiments in meta-environments demonstrate the superiority of TCMRL over existing offline meta-RL methods in generating more generalizable contexts and achieving effective adaptation to unseen target tasks.
Primary Area: reinforcement learning
Submission Number: 17802
Loading