Published: 01 Jan 2023, Last Modified: 11 Feb 2024AISTATS 2023Readers: Everyone
Abstract:Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We ...