An advantage based policy transfer algorithm for reinforcement learning with measures of transferability

An advantage based policy transfer algorithm for reinforcement learning with measures of transferability

TMLR Paper3921 Authors

09 Jan 2025 (modified: 09 Jun 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments through interaction with the environment. In most real-world applications, however, a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, fail in adversarial target tasks, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance measure to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability measure and similarities between source and target environments. We compare APT-RL with several baselines, including existing transfer-RL algorithms, in three high-dimensional continuous control tasks. Our experiments demonstrate that APT-RL outperforms existing transfer RL algorithms and is at least as good as learning from scratch in adversarial tasks.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=1yRo6jwMb7

Changes Since Last Submission: We thank the reviewers for carefully reading our work and providing valuable feedback. These insights have been very helpful in revising the paper, and we believe they have significantly improved both its readability and technical quality. We have addressed each of the reviewers' comments individually and provided detailed responses. In the following, we provide a brief discussion on the general changes that we have made to address these comments. 1. We have made significant changes to the introduction to improve the readability of the paper. 2. We have reorganized figure 1 to show clear connection between the toy example and the ideas proposed in the paper. 3. We have fixed notation issues throughout the paper. 4. We have made changes to theorem 1 to improve the technical clarity 5. We have modified algorithm 2 to improve the readability and notations

Assigned Action Editor: ~Romain_Laroche1

Submission Number: 3921

Loading