Reinforcement Learning With Adaptive Policy Gradient Transfer Across Heterogeneous Problems

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Emerg. Top. Comput. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: To date, transfer learning (TL) has been successfully applied for enhancing the learning performance of reinforcement learning (RL), and many transfer RL (TRL) approaches have been proposed in the literature. However, most of the existing TRL approaches consider knowledge transfer between RL tasks sharing the same state-action space. These methods thus may fail in cases where the RL tasks available for conducting knowledge transfer possess heterogeneous state-action spaces, which is common in many real-world applications. TRL across heterogeneous problem domains is challenging since the differences lie in the state-action spaces of the RL tasks are natural barriers in the knowledge transfer across tasks. This becomes more difficult if multiple heterogeneous source tasks are available when conducting knowledge transfer for a target RL task, as we have to identify the appropriate source task adaptively before performing knowledge transfer towards enhanced RL performance. In this article, we propose a new TRL algorithm with adaptive policy gradient transfer for the cases having multiple heterogeneous source RL tasks. The core ingredients of the proposed algorithm contain a source task selection module to select an appropriate task from a set of heterogeneous source tasks and a knowledge transfer module for conducting knowledge transfer across heterogeneous RL tasks. To investigate the performance of the proposed algorithm, we have conducted comprehensive empirical studies based on the well-known continuous robotic RL task with heterogeneous settings in the number of robot arms (links). The obtained results show that the proposed algorithm is effective and efficient in conducting knowledge transfer across heterogeneous problems for enhanced RL performance, over both the RL algorithm having no knowledge transfer in the learning process and the existing state-of-the-art TRL method.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview