Keywords: Reinforcement learning, Cross-domain transfer, Transfer learning, Preference-based RL
TL;DR: We study the cross-domain RL problem from the perspective of preference-based learning.
Abstract: We study the cross-domain RL (CDRL) problem from the perspective of preference-based learning. We identify the critical correspondence identifiability issue (CII) in the existing unsupervised CDRL methods and propose to mitigate CII with the weak supervision of preference feedback. Specifically, we propose the principle of cross-domain preference consistency (CDPC), which can serve as additional guidance for learning a proper correspondence between the source and target domains. To substantiate the principle of CDPC, we present an algorithm that integrates a state decoder learned by the preference consistency loss during training and a cross-domain MPC method for action selection during inference. Through extensive experiments in both MuJoCo and Robosuite, we demonstrate that CDPC can achieve effective and data-efficient knowledge transfer across domains than the state-of-the-art CDRL benchmark methods.
Submission Number: 17
Loading