Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations

Kuan-Chen Pan; MingHong Chen; Xi Liu; Ping-Chun Hsieh

Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations

Kuan-Chen Pan, MingHong Chen, Xi Liu, Ping-Chun Hsieh

Published: 19 Jun 2024, Last Modified: 26 Jul 2024ARLET 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cross-domain transfer; Transfer learning; Reinforcement learning

Abstract: Cross-domain reinforcement learning (CDRL) is meant to improve the data efficiency of RL by leveraging the data samples collected from a source domain to facilitate the learning in a similar target domain. Despite its potential, cross-domain transfer in RL is known to have two fundamental and intertwined challenges: (i) The source and target domains can have distinct representations (either in states or actions), and this makes direct transfer infeasible and thereby requires sophisticated inter-domain mappings; (ii) The domain similarity in RL is not easily identifiable a priori, and hence CDRL can be prone to negative transfer. In this paper, we propose to jointly tackle these two challenges through the lens of hybrid Q functions. Specifically, we propose $Q$Avatar, which combines the Q functions from both the source and target domains with a proper weight decay function. Through this design, we characterize the convergence behavior of $Q$Avatar and thereby show that $Q$Avatar achieves robust transfer in the sense that it effectively leverages a source-domain Q function for knowledge transfer to the target domain, regardless of the quality of the source-domain model and domain similarity. Through extensive experiments, we demonstrate that $Q$Avatar achieves superior transferability across domains on a variety of RL benchmark tasks, including locomotion and robot arm manipulation, even in the scenarios of potential negative transfer.

Submission Number: 62

Loading