High-Dimensional Prediction for Sequential Decision Making

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: A general framework for online adversarial vector-valued prediction with decision making applications
Abstract: We give an efficient algorithm for producing multi-dimensional forecasts in an online adversarial environment that have low bias subject to any polynomial number of conditioning events, that can depend both on external context and on our predictions themselves. We demonstrate the use of this algorithm with several applications. We show how to make predictions that can be transparently consumed by any polynomial number of downstream decision makers with different utility functions, guaranteeing them diminishing swap regret at optimal rates. We also give the first efficient algorithms for guaranteeing diminishing conditional regret in online combinatorial optimization problems for an arbitrary polynomial number of conditioning events --- i.e. on an arbitrary number of intersecting subsequences determined both by context and our own predictions. Finally, we give the first efficient algorithm for online multicalibration with $O(T^{2/3})$ rates in the ECE metric.
Lay Summary: Consider one or more agents who must sequentially take actions, playing in a nonstationary (possibly adversarially changing) environment; each action incurs some reward in each round of the interaction. (For instance, imagine the online routing game in which people living in the same city each want to get from home to work as fast as possible each morning.) There exist a variety of online learning algorithms which can directly optimize various performance benchmarks for any one agent, e.g. guaranteeing that the agent's taken actions will give them cumulative reward at least as high as if they always played the best fixed action. However, most such algorithms have one or more of these undesirable traits: (1) don't offer simultaneous guarantees to multiple agents at once, (2) are very inefficient when the action space is very large (e.g. the number of home-work routes is exponentially sized), and (3) cannot give these performance guarantees conditionally on various relevant events (i.e. they don't offer guarantees specifically on days when it rains, or on national holidays, or on days when an agent uses a particular road, etc). We develop an algorithmic framework that can solve all these issues in a large variety of sequential settings such as the one above! It lets us issue a single coordinated vector-forecast each day that is appropriately unbiased/calibrated such that all agents --- if they trust and use our forecast --- will all get strong performance guarantees.
Primary Area: Theory->Online Learning and Bandits
Keywords: online decision making, combinatorial optimization, multicalibration, calibration, swap regret, no-regret, conditional guarantees
Submission Number: 13513
Loading