Model-Based Reinforcement Learning under Random Observation Delays

Armin Karamzade; Kyungmin Kim; JB Lanier; Davide Corsi; Roy Fox

Model-Based Reinforcement Learning under Random Observation Delays

Armin Karamzade, Kyungmin Kim, JB Lanier, Davide Corsi, Roy Fox

18 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Model Based, Delays, POMDPs

Abstract: Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume immediate feedback from the environment. We study random feedback delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a filtering process within a model-based RL context that recursively updates the belief state based on incoming observations. We then introduce a simple delay-aware framework that incorporates this idea into RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach with delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to unseen delays during deployment. Additionally, we present experiments on more realistic robotic tasks, evaluating our method against common practical heuristics and emphasizing the importance of explicitly modeling observation delays.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 13977

Loading