Abstract: Meta-Reinforcement Learning (Meta-RL) is a machine learning paradigm aimed at learning reinforcement learning policies that can quickly adapt to unseen tasks with few-shot data. Nevertheless, applying Meta-RL to real-world applications faces challenges due to the cost of data acquisition. To address this problem, offline Meta-RL has emerged as a promising solution, focusing on learning policies from pre-collected data that can effectively and rapidly adapt to unseen tasks. In this paper, we propose a new offline Meta-RL method called Meta-Actor-Critic with Evolving Gradient Agreement (MACEGA). MACEGA utilizes an evolutionary approach to estimate meta-gradients conductive to generalization across unseen tasks. During meta-training, gradient evolution is utilized to meta-update the value network and policies. Moreover, we use gradient agreement as an optimization objective for meta-learning, thereby enhancing the generalization ability of the meta-policy. We experimentally demonstrate the robustness of MACEGA in handling offline data quality. Furthermore, extensive experiments on various benchmarks provide empirical evidence that MACEGA outperforms previous state-of-the-art methods in generalizing to unseen tasks, thus demonstrating its potential for real-world applications.
External IDs:dblp:conf/iros/ChenYCLMHL24
Loading