Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Published: 15 Aug 2023, Last Modified: 15 Aug 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=he5OT19Yle
Changes Since Last Submission: The revision takes into account all comments and suggestions of the reviewers of the original submission. In particular, we have added - a definition and some intuition for sub-Gaussian distributions, - more intuition for the presented algorithms, - more discussion of our results with respect to the existing literature, - relevant references on thresholding bandits, - error bars showing the standard error instead of an empirical confidence interval, which has caused confusion in the interpretation of the plots in the previous version, - an appendix containing an algorithm using potential functions and respective regret analysis. All changes are set in blue font to make it easier to compare to the original submission. However, in the course of the revision we have re-arranged some of the material and to meet the page limit we have moved the proof of Theorem 1 to the appendix. *NB:* We would like to point out that in the review process of the original submission we were asked to submit a revision, which we promised to do. Just yesterday we were suddenly informed that the paper has been rejected after all due to "lack of on-time revision" without further notice, although we were never given a submission deadline for the revision. Accordingly, the Editor in Chief may want to assign the paper to the same Action Editor as has been suggested by the Action Editor himself.
Supplementary Material: zip
Assigned Action Editor: ~Branislav_Kveton1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1242
Loading