Policy Improvement Over an Alternate Observation Space

Policy Improvement Over an Alternate Observation Space

ICML 2023 Workshop SCIS Submission77 Authors

Published: 20 Jun 2023, Last Modified: 28 Jul 2023SCIS 2023 PosterEveryoneRevisions

Keywords: reinforcement learning, causal inference, backdoor adjustment, proxy correction

TL;DR: We provide multiple strategies for de-confounding the naive policy gradient when data was collected by a policy with a different observation space.

Abstract: We consider the problem of improving upon a black-box policy which operates on a different observation space than the learner. Such problems occur when augmenting an existing hand-engineered system with a new machine learning model or in a shared autonomy / human-AI complementarity context. We prove that following the naive policy gradient can lead to a decrease in performance because of incorrect grounding in a different observation space. Then, if we have access to both sets of observation at train time, we derive a method for correctly estimating a policy gradient via an application of the backdoor criterion. If we don't, we prove that under certain assumptions, we can use the proxy correction to correctly estimate a direction of improvement.

Submission Number: 77

Loading