Keywords: Differentiable games, multi-agent reinforcement learning, general-sum games, lola
Abstract: Optimization problems with multiple, interdependent losses, such as Generative Adversarial Networks (GANs) or multi-agent RL, are commonly formalized as differentiable games.
Learning with Opponent-Learning Awareness (LOLA) introduced opponent shaping to this setting. More specifically, LOLA introduced an augmented learning rule that accounts for the agent's influence on the anticipated learning step of the other agents. However, the original LOLA formulation is inconsistent because LOLA models other agents as naive learners rather than LOLA agents.
In previous work, this inconsistency was stated to be the root cause of LOLA's failure to preserve stable fixed points (SFPs). We provide a counterexample by investigating cases where Higher-Order LOLA (HOLA) converges.
Furthermore, we show that, contrary to claims made, Competitive Gradient Descent (CGD) does not solve the consistency problem.
Next, we propose a new method called Consistent LOLA (COLA), which learns update functions that are consistent under mutual opponent shaping. Lastly, we empirically compare the performance and consistency of HOLA, LOLA, and COLA on a set of general-sum learning games.
Supplementary Material: zip
21 Replies
Loading