Keywords: Reinforcement Learning, Model Based, Delays, POMDPs
Abstract: Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume immediate feedback from the environment. We study random feedback delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a filtering process within a model-based RL context that recursively updates the belief state based on incoming observations. We then introduce a simple delay-aware framework that incorporates this idea into RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach with delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to unseen delays during deployment. Additionally, we present experiments on more realistic robotic tasks, evaluating our method against common practical heuristics and emphasizing the importance of explicitly modeling observation delays.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 13977
Loading