Abstract: The framework of uncoupled online learning in multiplayer games has made significant progress in recent years. In particular, the development of time-varying games has considerably expanded its modeling capabilities. However, current regret bounds quickly become vacuous when the game undergoes significant variations over time, even when these variations are easy to predict. Intuitively, the ability of players to forecast future payoffs should lead to tighter guarantees, yet existing approaches fail to incorporate this aspect. This work aims to fill this gap by introducing a novel prediction-aware framework for time-varying games, where agents can forecast future payoffs and adapt their strategies accordingly. In this framework, payoffs depend on an underlying state of nature that agents predict in an online manner. To leverage these predictions, we propose the POMWU algorithm, a contextual extension of the optimistic Multiplicative Weight Update algorithm, for which we establish theoretical guarantees on social welfare and convergence to equilibrium. Our results demonstrate that, under bounded prediction errors, the proposed framework achieves performance comparable to the static setting. Finally, we empirically demonstrate the effectiveness of POMWU in a traffic routing experiment.
Lay Summary: This research looks at how people (or computer programs) learn to make decisions in changing environments where they don’t know everything about what others are doing—like drivers choosing routes in busy traffic without knowing what roads others will take. Traditionally, these types of decision-making systems struggle when the situation changes a lot over time, even if those changes are predictable. This paper introduces a new approach that allows decision-makers to use their predictions about future changes to make better choices. We create a new method (called POWMU) that helps these decision-makers adjust their strategies based on what they expect will happen. We show that when predictions are fairly accurate, our method performs just as well as if the situation weren’t changing at all. We also test it with a traffic routing simulation and found it works well in practice.
Primary Area: Theory->Online Learning and Bandits
Keywords: Learning in Games, Multi-agent systems, Online Learning, Equilibrium, Social Welfare, Time-varying games
Submission Number: 7079
Loading