Published: 01 Jan 2021, Last Modified: 11 Feb 2024ICML 2021Readers: Everyone
Abstract:Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specifie...