Balancing Adaptability and Non-exploitability in Repeated Games

Anthony DiGiovanni; Ambuj Tewari

Balancing Adaptability and Non-exploitability in Repeated Games

Anthony DiGiovanni, Ambuj Tewari

Published: 20 May 2022, Last Modified: 20 Apr 2025UAI 2022 PosterReaders: Everyone

Keywords: multi-agent learning, bargaining, repeated games, Pareto efficiency, general sum

TL;DR: Our algorithm provably learns best responses to several opponent classes, and deters exploitative opponents.

Abstract: We study the problem of adaptability in repeated games: simultaneously guaranteeing low regret for several classes of opponents. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some “fair” value. Our solution is an expert algorithm (LAFF), which searches within a set of sub-algorithms that are optimal for each opponent class, and punishes evidence of exploitation by switching to a policy that enforces a fair solution. With benchmarks that depend on the opponent class, we first show that LAFF has sublinear regret uniformly over these classes. Second, we show that LAFF discourages exploitation, because exploitative opponents have linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/balancing-adaptability-and-non-exploitability/code)

4 Replies

Loading