Diversity-Preserving $K$--Armed Bandits, Revisited

Hedi Hadiji; Sébastien Gerchinovitz; Jean-Michel Loubes; Gilles Stoltz

Diversity-Preserving $K$--Armed Bandits, Revisited

Hedi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz

Published: 14 Jul 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits. We design a UCB algorithm using the specific structure of the setting and show that it enjoys a bounded distribution-dependent regret in the natural cases when the optimal mixed actions put some probability mass on all actions (i.e., when diversity is desirable). The regret lower bounds provided show that otherwise, at least when the model is mean-unbounded, a $\ln T$ regret is suffered. We also discuss an example beyond the special case of polytopes.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: For the camerea-ready version, we revised our article and incorporated the changes suggested and discussed in the thread below. In particular, in the order of the article: - we extended the literature review to incorporate the references on mediator feedback and the ones extending preservation of diversity in the context of MDPs; - we added comments to Theorem 1 (the regret upper bounds): that the main achievement is the bounded regret case, not the $\ln T$ rate, and that the latter $\ln T$ bound does not specialize to the classic UCB bound; and also, we provided an overview and intuition of the proof techniques, underlying the adaptations made, together with a comparison to how bounded regret arises for structured bandits; - we deeply re-worked the exposition of Section 3 (the regret lower bounds and optimality): we first describe our high-level aims and then only dig into the technical details, which we broke down into two subsections---the lower bound, and then the discussion of optimality; we also added the proposed example of mean-unbounded model and corrected the statement of Assumption 1; - we added a paragraph at the beginning of Section 4 (an example of a diversity-preserving set $\mathcal{P}$ not given by a polytope) to explain that we only deal with an example therein, to illustrate that rates in between a constant and $\ln T$ may be achieved, but that building a general theory is left for future research; - we added a section with the simple numerical experiments submitted during the discussion period (and corrected our implementation of L1-OFUL, which now exhibits logarithmic regret in both cases); - we added a conclusion section, pointing out the limitations (i.e., what is left for future research); - we incorporated the corrections of the unfortunate typos spotted in Section A.1 in the course of the reviewing process.

Code: https://github.com/H2DI/diversity_preserving_band_sim

Supplementary Material: zip

Assigned Action Editor: ~Lijun_Zhang1

Submission Number: 2477

Loading