Blessing from Experts: Super Reinforcement Learning in Confounded EnvironmentsDownload PDF


22 Sept 2022, 12:39 (modified: 18 Nov 2022, 00:16)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Abstract: We introduce super reinforcement learning in the batch setting, which takes the observed action as input for enhanced policy learning. In the presence of unmeasured confounders, the recommendations from human experts recorded in the observed data allow us to recover certain unobserved information. Including this information in the policy search, the proposed super reinforcement learning will yield a super policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., the expert’s recommendation). Furthermore, to address the issue of unmeasured confounding in finding super-policies, a number of non-parametric identification results are established. Finally, we develop two super-policy learning algorithms and derive their corresponding finite-sample regret guarantees.
