Keywords: causal inference, meta-learning, graphical models
TL;DR: Causal MAML uses partial identification to construct counterfactual trajectories from confounded data, enabling robust policy initialization that adapts quickly in target domains.
Abstract: Meta-Reinforcement Learning (Meta-RL) focuses on training policies using data collected from a variety of diverse environments. This approach enables the policy to adapt to new settings with only a few training steps. While many Meta-RL methods have demonstrated success, they often rely on the assumption that unobserved confounders can be excluded \emph{a priori}. This paper investigates robust Meta-RL in sequential decision-making, given confounded observational data collected across multiple heterogeneous environments. We introduce a novel augmentation procedure, called Causal MAML, which employs partial identification methods to generate posterior counterfactual trajectories from candidate environments that align with the confounded observations. These counterfactual trajectories are then used to find a policy initialization that produces strong generalization performance in the target domain. Theoretical analysis reveals that our causal Meta-RL approach is guaranteed to yield a solution that minimizes generalization loss.
Primary Area: causal reasoning
Submission Number: 12557
Loading