Dynamic Modeling for Reinforcement Learning with Random Delay

Yalou Yu, Bo Xia, Minzhi Xie, Zhiheng Li, Xuwqian Wang

Published: 2024, Last Modified: 22 Jan 2026ICANN (4) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Delays in real-world tasks degrade the performance of standard reinforcement learning (RL) which is based on the assumption that environmental feedback and action selection are instantaneous. Many approaches in RL community have been proposed to solve the problem caused by observation delay or action delay. However, previous methods suffer from inaccurate state predictions in consideration of accumulation error, limited tasks with specific action space or more complicated random delays situation. Motivated by the goal to solve those problems, in this paper, we propose a new algorithm named Prediction model with Arbitrary Delay (PAD) which aims at predicting delayed states more accurately through a gated unit for better decision making, especially in environments with random delays. Specifically, the proposed method tremendously alleviates the cumulative errors by using multi-step prediction model and could be applied to different kinds of tasks in virtue of the unique model structure. Experiments on continuous and discrete control tasks demonstrate that PAD achieves higher performance than the state-of-the-art methods in solving delays in RL.