Keywords: offline reinforcement learning, causal feature selection
TL;DR: methods for causal/decision-theoretic sparsity under structural restrictions, even when estimation is not sparse
Abstract: This paper studies causal variable selection in the setting of a Markov decision process, specifically offline reinforcement learning with linear function approximation. The structural restrictions of the data-generating process presume that the transitions factor into sparse dynamics that affect the reward, and additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function is sparse. This is a novel "causal sparsity" notion that does not occur in pure estimation settings. We develop methods for filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso: we use thresholded lasso to recover the support of the rewards, and use this estimated support to estimate the state-action $Q$ function. Such a method has sample complexity depending only on the size of the sparse component. Although this problem differs from the typical statement of "causal representation learning", this notion of "causal sparsity" may be of interest, and our methods connect to a classical statistical literature with theoretical guarantees that can be a stepping stone for more complex representation learning.
Submission Number: 55
Loading