Modeling Dynamics under Random Delays in Reinforcement Learning

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Random Delay, Model-based Reinforcement Learning
Abstract: Reinforcement learning in real-world systems often encounters delays in sensing and actuation, violating the standard Markov decision process (MDP) assumptions of immediate and fully observed states. While world models offer a promising potential to solve such random-delayed MDPs by imagining undelayed environment dynamics, random actuation delays introduce uncertainty that hinders its direct application. Specifically, world models require the executed, instead of the issued actions to make accurate imaginations of the current state. We present a novel analysis that distinguishes the effects of observation and action delays on world models, revealing an asymmetry that can be exploited to improve the learning process. To address the uncertainty caused by stochastic action execution delays, we propose representing imagined latent states as expected latent states—probability-weighted averages over all possible action-execution trajectories. Compare to sampling latent state via single possible action execution trajectory, the expected latent reduces variance in training targets and captures multiple plausible futures at inference. We instantiate our approach using DreamerV3 and validate it on the DeepMind Control Suite with visual inputs. Experimental results show that our method achieves significantly higher returns, more accurate dynamics predictions, and improved training stability across a wide range of delay settings compared to strong baselines.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 361
Loading