Online Reinforcement Learning for Mixed Policy Scopes

Junzhe Zhang; Elias Bareinboim

Online Reinforcement Learning for Mixed Policy Scopes

Junzhe Zhang, Elias Bareinboim

Published: 31 Oct 2022, Last Modified: 12 Oct 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: Causal inference, Reinforcement Learning, Graphical Models

TL;DR: This paper investigates the online reinforcement learning setting for optimizing policies with mixed state-action spaces.

Abstract: Combination therapy refers to the use of multiple treatments -- such as surgery, medication, and behavioral therapy - to cure a single disease, and has become a cornerstone for treating various conditions including cancer, HIV, and depression. All possible combinations of treatments lead to a collection of treatment regimens (i.e., policies) with mixed scopes, or what physicians could observe and which actions they should take depending on the context. In this paper, we investigate the online reinforcement learning setting for optimizing the policy space with mixed scopes. In particular, we develop novel online algorithms that achieve sublinear regret compared to an optimal agent deployed in the environment. The regret bound has a dependency on the maximal cardinality of the induced state-action space associated with mixed scopes. We further introduce a canonical representation for an arbitrary subset of interventional distributions given a causal diagram, which leads to a non-trivial, minimal representation of the model parameters.

Supplementary Material: pdf

14 Replies

Loading