Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

Xiong-Hui Chen; Shengyi Jiang; Feng Xu; Zongzhang Zhang; Yang Yu

Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning

Abstract: In visual-input sim-to-real scenarios, to overcome the reality gap between images rendered in simulators and those from the real world, domain adaptation, i.e., learning an aligned representation space between simulators and the real world, then training and deploying policies in the aligned representation, is a promising direction. Previous methods focus on same-modal domain adaptation. However, those methods require building and running simulators that render high-quality images, which can be difficult and costly. In this paper, we consider a more cost-efficient setting of visual-input sim-to-real where only low-dimensional states are simulated. We first point out that the objective of learning mapping functions in previous methods that align the representation spaces is ill-posed, prone to yield an incorrect mapping. When the mapping crosses modalities, previous methods are easier to fail. Our algorithm, Cross-mOdal Domain Adaptation with Sequential structure (CODAS), mitigates the ill-posedness by utilizing the sequential nature of the data sampling process in RL tasks. Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

TL;DR: We propose Cross-mOdal Domain Adaptation with Sequential structure (CODAS) to mitigate the ill-posedness of the objective in previous unsupervised domain adaptation framework for RL tasks.

Code: https://github.com/xionghuichen/CODAS

18 Replies

Loading