Online Learning with Ranking Feedback and An Application to Equilibrium Computation

Mingyang Liu; Yongshan Chen; Zhiyuan Fan; Gabriele Farina; Asuman E. Ozdaglar; Kaiqing Zhang

Online Learning with Ranking Feedback and An Application to Equilibrium Computation

Mingyang Liu, Yongshan Chen, Zhiyuan Fan, Gabriele Farina, Asuman E. Ozdaglar, Kaiqing Zhang

Published: 06 Mar 2025, Last Modified: 16 May 2025ICLR 2025 Bi-Align Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Online Learning, Equilibrium Finding, Ranking Feedback

Abstract: Online learning in arbitrary and possibly adversarial environments has been extensively studied in sequential decision-making, with a strong connection to equilibrium computation in game theory. Most existing online learning algorithms are based on \emph{numeric} utility feedback from the environment, which may be unavailable in applications with humans in the loop and/or the presence of privacy concerns. In this paper, we study an online learning model where only a \emph{ranking} of a set of proposed actions is provided to the learning agent at each timestep. We consider both ranking models based on either the \emph{instantaneous} utility at each timestep, or the \emph{time-average} utility until the current timestep, in both \emph{full-information} and \emph{bandit} feedback settings. Focusing on the standard (external-)regret metric, we show that sublinear regret cannot be achieved in general with the instantaneous utility ranking feedback. Moreover, we show that when the ranking model is relatively \emph{deterministic} (\emph{i.e.,} with a small temperature), sublinear regret cannot be achieved with the time-average utility ranking feedback, either. We then propose new algorithms to achieve sublinear regret, under the additional assumption that the utility vectors have a sublinear variation. Notably, we also show that when time-average utility ranking is used, such an additional assumption can be avoided in the full-information setting. As a consequence, we show that if all the players follow our algorithms, an approximate coarse correlated equilibrium of a normal-form game can be found through repeated play. Finally, we also validate the efficiency of our algorithms via numerical experiments.

Submission Type: Long Paper (9 Pages)

Archival Option: This is a non-archival submission

Presentation Venue Preference: ICLR 2025

Submission Number: 52

Loading