TL;DR: We consider the problem of learning to exploit learning algorithms through repeated interactions in games.
Abstract: We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner's objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner's payoff structure. We demonstrate the effectiveness of this approach if the learner’s algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.
Lay Summary: As AI is increasingly deployed in rapidly changing real-world scenarios such as self-driving cars, online advertising auctions, personalized recommendation platforms, high-frequency trading, autonomous logistics routing, and smart-grid control, these adaptive systems regularly interact with both human users and other algorithms that are simultaneously learning to maximize their own utility. In this paper, we seek to understand how an agent operating in such a competitive, information-limited environment should deliberately deviate from purely myopic learning behavior—without explicit awareness of the internal mechanics of its co-learners—by studying a simplified model in which two players repeatedly play a finite matrix game over many rounds. Our analysis highlights when such strategic deviations succeed and when they provably fail, offering concrete guidance for researchers and engineers who wish to deploy robust, ethical, and transparent AI systems that interact, negotiate, or compete safely and profitably with people and other machine learners in our increasingly interconnected society.
Primary Area: Theory->Game Theory
Keywords: Learning in games, steering
Submission Number: 8297
Loading