Communicating via Markov Decision Processes

Samuel Sokota; Christian Schroeder de Witt; Maximilian Igl; Luisa M Zintgraf; Philip Torr; J Zico Kolter; Shimon Whiteson; Jakob Nicolaus Foerster

Communicating via Markov Decision Processes

Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa M Zintgraf, Philip Torr, J Zico Kolter, Shimon Whiteson, Jakob Nicolaus Foerster

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: coding, communication, maximum entropy reinforcement learning, minimum entropy coupling

Abstract: We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available---namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call greedy minimum entropy coupling (GME). We show both that GME is able to outperform a relevant baseline on small MCGs and that GME is able to scale efficiently to extremely large MCGs. To the latter point, we demonstrate that GME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.

One-sentence Summary: Proposes and investigates a problem setting in which the goal is to communicate exogenous information via MDP trajectories.

Supplementary Material: zip

9 Replies

Loading