2019 (modified: 03 Nov 2022)AISTATS 2019Readers: Everyone
Abstract:In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is much worse than a default production action. In general, this ...