Robust Proximal Adversarial Reinforcement Learning Under Model Mismatch

Peng Zhai; Xiaoyi Wei; Taixian Hou; Xiaopeng Ji; Zhiyan Dong; Jiafu Yi; Lihua Zhang

Robust Proximal Adversarial Reinforcement Learning Under Model Mismatch

Peng Zhai, Xiaoyi Wei, Taixian Hou, Xiaopeng Ji, Zhiyan Dong, Jiafu Yi, Lihua Zhang

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Reinforcement learning (RL) can generate high-performance control policies for complex tasks in simulation through an end-to-end approach. However, the RL policy is not robust to uncertainties caused by modeling mismatch between simulation and real environments, making it difficult to transfer to the real world. In response to the above challenge, this letter introduces a lightweight and efficient robust RL algorithm. The algorithm transforms the optimization objective of the adversary from a long-term cumulative reward to a short-term reward, making the adversary focus on the performance in the near future. Additionally, the adversarial actions are projected onto a finite subset within the perturbation space using projected gradient descent, effectively constraining the adversary's strength and obtaining more robust policies. Extensive experiments in both simulated and real environments show that our algorithm improves the generalization ability of the policy for the modeling mismatch, outperforming the next best prior methods across almost all environments.

Loading