Abstract: We design a multi-agent and networked policy gradient algorithm in Markov potential games. Each agent has its own rewards and utility as functions of joint actions and a shared state among agents. The state dynamics depend on the joint actions taken. Differentiable Markov potential games are defined based on the existence of a potential (value) function having partial gradients equal to the local gradients of agents' individual value functions. Agents implement continuous parameterized policies defined over the state and other agents' parameters to maximize their utilities against each other. Agents compute their stochastic policy gradients to update their parameters with respect to their local estimates of Q-functions and joint parameters. The updated parameters are shared with neighbors over a time-varying network. We prove the convergence of joint parameters to a first-order stationary point of the potential function in probability for any type of state and action spaces. Numerical results illustrate the potential advantages of using networked policies compared to independent policies.
External IDs:dblp:conf/cdc/AydinE23
Loading