Delay-Aware Reinforcement Learning: Insights From Delay Distributional Perspective

Zhuoru Yu; Chenchen Fu; Hengkai Zhong; Wanyuan Wang; Weiwei Wu; Chun Jason Xue

Delay-Aware Reinforcement Learning: Insights From Delay Distributional Perspective

Zhuoru Yu, Chenchen Fu, Hengkai Zhong, Wanyuan Wang, Weiwei Wu, Chun Jason Xue

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Random Delays, Value Correction, SAC

Abstract: Although deep reinforcement learning (DRL) has achieved great success across various domains, the presence of random delays in real-world scenarios (e.g., remote control) poses a significant challenge to its practicality. Existing delay-aware DRLs mainly focus on state augmentation with historical memory, ensuring that the actions taken are aligned with the true state. However, these approaches still rely on the conventional expected $Q$ value. In contrast, to model delay uncertainty, we aim to go beyond the expected value and propose a distributional DRL to represent the distribution of this $Q$ value. Based on the delay distribution, we further propose a correction mechanism for the distributional $Q$ value, enabling the agent to learn accurate returns in delayed environments. Finally, we apply these techniques to design the delay-aware distributional actor-critic (DADAC) DRL framework, in which the critic is the corrected distributional value function. Experimental results demonstrate that compared to the state-of-the-art delay-aware DRL methods, the proposed DADAC exhibits substantial performance advantages in handling random delays in the MuJoCo continuous control tasks. The corresponding source code is available at https://anonymous.4open.science/r/DADAC.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9578

Loading