Robust deterministic policy gradient for disturbance attenuation

Robust deterministic policy gradient for disturbance attenuation

ICLR 2026 Conference Submission19676 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: robust reinforcement learning, deterministic policy gradient

Abstract: Reinforcement learning (RL) has achieved remarkable success across various control and decision-making tasks. However, RL agents often show unstable and low performance when it encounter environments with unexpected external disturbances and model uncertainty. Therefore, it is crucial to develop RL agents that can sustain stable performance under such conditions. To address this issue, this paper proposes an RL algorithm called robust deterministic policy gradient (RDPG) based on adversarial RL and $H_\infty$ control methods. We formulate a maxmin objective function motivated by $H_\infty$ control, which enables both the agent and the adversary to be trained in a stable and efficient manner. In this formulation, the user seeks a robust policy to maximize the objective function, while an adversary injects disturbances to minimize it. Furthermore, for high-dimensional continuous control tasks, we introduce robust deep deterministic policy gradient (RDDPG), which combines the robustness of RDPG with the stability and learning efficiency of deep deterministic policy gradient (DDPG). Experimental evaluations in MuJoCo environments demonstrate that the proposed RDDPG outperforms baseline algorithms in terms of robustness against both external disturbances and model parameter uncertainties.

Supplementary Material: pdf

Primary Area: reinforcement learning

Submission Number: 19676

Loading