Abstract: Highlights•The sample inefficiency is one of the sim-to-real problems in reinforcement learning.•Present representation learning with Siamese networks lacks task-related features.•Siamese Q fuses task information into the representation for Q values.•Partial observation fusion model fuses sequential information into the representation.•The policy trained in the simulator can be directly transferred to the real world.
Loading