everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
This paper investigates the stochastic contextual bandit problem with general function space and graph feedback. We propose a novel algorithm that effectively adapts to the time-varying graph structures, leading to improved regret bounds in stochastic settings compared with existing approaches. Notably, our method does not require prior knowledge of graph parameters or online regression oracles, making it highly practical and innovative. Furthermore, our algorithm can be modified to derive a gap-dependent upper bound on regrets, addressing a significant research gap in this field. Extensive numerical experiments validate our findings, showcasing the adaptability of our approach to graph feedback settings. The numerical results demonstrate that regrets of our method scale with graph parameters rather than action set sizes. This algorithmic advancement in stochastic contextual bandits with graph feedback shows practical implications in various domains.