Improved Regret Bounds in Stochastic Contextual Bandits with Graph Feedback

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: side-observations, probabilistic feedback, gap-dependent upper bound
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: This paper investigates the stochastic contextual bandit problem with general function space and graph feedback. We propose a novel algorithm that effectively adapts to the time-varying graph structures, leading to improved regret bounds in stochastic settings compared with existing approaches. Notably, our method does not require prior knowledge of graph parameters or online regression oracles, making it highly practical and innovative. Furthermore, our algorithm can be modified to derive a gap-dependent upper bound on regrets, addressing a significant research gap in this field. Extensive numerical experiments validate our findings, showcasing the adaptability of our approach to graph feedback settings. The numerical results demonstrate that regrets of our method scale with graph parameters rather than action set sizes. This algorithmic advancement in stochastic contextual bandits with graph feedback shows practical implications in various domains.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1984
Loading