Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Armed Bandit Problems, Strategic Arms, Incentives
Abstract: We examine multi-armed bandit problems featuring strategic arms under debt-free reporting. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by retaining a portion of the observed reward, potentially disclosing only a fraction of it to the player. This scenario unfolds as a game over $T$ rounds, leading to a competition of objectives between the player, aiming to minimize regret, and the arms, motivated by the desire to maximize their individual utilities. To address these dynamics, we propose an algorithm that establishes an equilibrium wherein each arm behaves truthfully and discloses as much of its rewards as possible. Utilizing this algorithm, the player can attain the second-highest average (true) reward among arms, with a cumulative regret bounded by $O(\log(T)/\Delta)$ (problem-dependent) or $O(\sqrt{T\log(T)})$ (worst-case).
Primary Area: Bandits
Submission Number: 3205
Loading