Published: 01 Jan 2023, Last Modified: 17 Aug 2023AISTATS 2023Readers: Everyone
Abstract:We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate o...