Abstract: We propose a neural network architecture for domain adaptation in reinforcement learning. The architecture allows learning similar latent representations for similar observations from different environments without access to a one-to-one correspondence between the observations. The model achieves the alignment between the latent codes via learning shared dynamics for different environments and matching marginal distributions of latent codes. Furthermore, a single policy trained upon the latent representations from one environment acts optimally simultaneously for different environments.
0 Replies
Loading