Remote Reinforcement Learning with Communication Constraints

Szymon Kobus; Deniz Gunduz

Remote Reinforcement Learning with Communication Constraints

Szymon Kobus, Deniz Gunduz

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, communication, source coding, compression, sampling, channel simulation

TL;DR: Solve reinforcement learning problems where only the controller observes the reward and communicates with the actor(s) over a capacity-constrained channel.

Abstract: We introduce the novel problem of remote reinforcement learning (RRL) with a communication constraint, in which the actor that takes the actions in the environment lacks direct access to the reward signal. Instead, the rewards are observed by a controller, which communicates with the agent through a communication-constrained channel. This can model a remote control scenario over a wireless channel, where the communication link from the controller to the agent has limited capacity due to power, bandwidth, or delay constraints. In the proposed solution, rather than transmitting the reward values to the agent over the rate-limited channel, the controller learns the optimal policy, and at each round, signals the action that the agent should take over the channel. However, instead of sending the precise action--which can be prohibitive when the action set is large--we use an importance sampling approach to reduce the communication load, which allows the agent to sample an action from the current policy. The actor, sampling from the desired policy at each turn, can also learn the optimal policy, albeit at a slower pace, using supervised learning. We exploit the learned policy at the actor to further reduce the communication load. Our solution, called Guided Remote Action Sampling Policy (GRASP), exhibits a significant reduction in communication requirements, achieving an average of 12-fold decrease in data transmission across all experiments, and 50-fold reduction for environments with continuous action spaces. We also show the applicability of GRASP beyond single-agent scenarios, including parallel and multi-agent environments.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11807

Loading