Abstract: The MuZero reinforcement learning method has achieved superhuman performance at games, and advances that enable MuZero to contend with complex actions now enable use of MuZero-class methods in real-world decision-making applications. However, some real-world applications are susceptible to state perturbations caused by malicious attacks and noisy sensors. To enhance the robustness of MuZero-class methods to state perturbations, we propose RobustZero, the first MuZero-class method that is $\underline{robust}$ to worst-case and random-case state perturbations, with $\underline{zero}$ prior knowledge of the environment’s dynamics. We present a training framework for RobustZero that features a self-supervised representation network, targeting the generation of a consistent initial hidden state, which is key to obtain consistent policies before and after state perturbations, and it features a unique loss function that facilitates robustness. We present an adaptive adjustment mechanism to enable model update, enhancing robustness to both worst-case and random-case state perturbations. Experiments on two classical control environments, three energy system environments, three transportation environments, and four Mujoco environments demonstrate that RobustZero can outperform state-of-the-art methods at defending against state perturbations.
Lay Summary: MuZero, a recent reinforcement learning (RL) method, has achieved remarkable success in games, surpassing human performance. Its strengths have enabled its adoption in real-world decision-making tasks, e.g., autonomous driving and voltage control. However, in these settings, systems often encounter state perturbations—errors in input states caused by sensor noise or malicious attacks. These perturbations can mislead the MuZero agent, leading to suboptimal or unsafe decisions.
To address this challenge, we propose RobustZero, a novel robust RL method that extends MuZero to defend against state perturbations. RobustZero incorporates the contrastive learning and an adaptive adjustment mechanism to produce consistent and robust policies before and after perturbations. Notably, RobustZero consistently outperforms existing methods, particularly in environments with noisy or adversarial inputs.
Our results highlight the importance of robustness in RL and provide insights into designing agents that remain reliable even under imperfect observations.
Primary Area: Reinforcement Learning->Deep RL
Keywords: Reinforcement learning, state perturbation, MuZero, deep learning
Submission Number: 4748
Loading