Abstract: Reinforcement learning (RL) can generate high-performance control policies for complex tasks in simulation through an end-to-end approach. However, the RL policy is not robust to uncertainties caused by modeling mismatch between simulation and real environments, making it difficult to transfer to the real world. In response to the above challenge, this letter introduces a lightweight and efficient robust RL algorithm. The algorithm transforms the optimization objective of the adversary from a long-term cumulative reward to a short-term reward, making the adversary focus on the performance in the near future. Additionally, the adversarial actions are projected onto a finite subset within the perturbation space using projected gradient descent, effectively constraining the adversary's strength and obtaining more robust policies. Extensive experiments in both simulated and real environments show that our algorithm improves the generalization ability of the policy for the modeling mismatch, outperforming the next best prior methods across almost all environments.
Loading