Abstract: The sim-to-real gap, a long-standing challenge in the field of robotics, has garnered significant attention. Essentially, it is important to learn robust representation models that can be seamlessly applied in both simulation and real world. Traditional approaches like domain randomization have demonstrated success in zero-short setting, by creating representations that are resilient and adaptable through the augmentation of diversity within simulations. However, they suffer from the need for extensive training across a range of parameter variances, and dependency on heuristic approaches. In this work, we present a novel reinforcement learning architecture named Soft Attention-Augmented Actor-Critic (Soft3AC) for sim-to-real robotic tasks without the need for heuristic domain randomization. Our approach achieves the learning of semantically task-relevant feature representations that exhibit resilience against appearance gaps. This is realized by employing an architectural design that separates current perceptions from historical perceptions in memory, fostering abstract spatial-temporal understanding. Simultaneously, the introduction of an attention mechanism enables a more contextual processing. We validated our method through conducting a valve rotation task with a robotic hand, under both sim-to-sim and sim-to-real conditions. The results indicate that our model adeptly bridges the appearance gap observed in sim-to-sim and sim-to-real transfers. Our method demonstrated its ability to be deployed directly into the real world in a domain randomization free zero-shot manner.
Loading