Collaborative Learning under Strategic Behavior: Mechanisms for Eliciting Feedback in Principal-Agent Bandit Games

Ramakrishnan K; Arpit Agarwal; Lakshminarayanan Subramanian; Maximilian Nickel

Collaborative Learning under Strategic Behavior: Mechanisms for Eliciting Feedback in Principal-Agent Bandit Games

Ramakrishnan K, Arpit Agarwal, Lakshminarayanan Subramanian, Maximilian Nickel

Published: 18 Jun 2024, Last Modified: 16 Jul 2024Agentic Markets @ ICML'24 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Collaborative learning, multi-armed bandits, strategic behavior, mechanism design

Abstract: Collaborative multi-armed bandits (MAB) has emerged as a promising framework that allows multiple agents to share the \emph{burden of exploration} and find optimal actions for a common problem. While several recent results demonstrate the benefit of collaboration in minimizing per-agent regret, prior work on collaborative MABs primarily rely on the assumption that all participating agents behave truthfully. The case of \emph{strategic} agent behavior where an agent may {\em free-ride} on the information shared by others without performing exploration has received limited attention in the collaborative MAB setting; such free-riding strategies can lead to a collapse in exploration resulting in high regret for all honest agents. This paper addresses the problem of collaborative multi-armed bandits in the presence of strategic agent behavior. Our main contribution is to design mechanisms for penalizing agents so that truthful behavior, i.e., performing sufficient exploration and reporting feedback accurately, is a Nash equilibrium. Furthermore, under this Nash equilibrium, the per-agent regret with collaboration is $\smash{\sqrt{M}}$-factor smaller than the per-agent regret without collaboration, where $M$ is the number of agents. Our results establish that it is possible to achieve the benefit of collaboration even in the presence of strategic agents who may want to free-ride. Semi-synthetic experiments show that our theoretical results hold empirically, as well.

Submission Number: 30

Loading