Abstract: The limited transferability of learned policies is a major challenge that restricts the applicability of learning-based solutions in decision-making tasks. In this paper, we present a simple method for aligning latent state representations across different domains using unaligned trajectories of proxy tasks. Once the alignment process is completed, policies trained on the shared representation can be transferred to another domain without further interaction. Our key finding is that multi-domain behavioral cloning is a powerful means of shaping a shared latent space. We also observe that the commonly used domain discriminative objective for distribution matching can be overly restrictive, potentially disrupting the latent state structure of each domain. As an alternative, we propose to use maximum mean discrepancy for regularization. Since our method focuses on capturing shared structures, it does not require discovering the exact cross-domain correspondence that existing methods aim for. Furthermore, our approach involves training only a single multi-domain policy, making it easy to extend. We evaluate our method across various domain shifts, including cross-robot and cross-viewpoint settings, and demonstrate that our approach outperforms existing methods that employ adversarial domain translation. We also conduct ablation studies to investigate the effectiveness of each loss component for different domain shifts.
Submission Number: 31
Loading