Langevin Monte Carlo for Contextual Bandits

Pan Xu, Hongkai Zheng, Eric V. Mazumdar, Kamyar Azizzadenesheli, Animashree Anandkumar

2022 (modified: 15 Nov 2022)ICML 2022Readers: Everyone

Abstract: We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the pos...

0 Replies