Confounding Robust Meta-Reinforcement Learning: A Causal Approach

Confounding Robust Meta-Reinforcement Learning: A Causal Approach

ICLR 2026 Conference Submission12557 Authors

18 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: causal inference, meta-learning, graphical models

TL;DR: Causal MAML uses partial identification to construct counterfactual trajectories from confounded data, enabling robust policy initialization that adapts quickly in target domains.

Abstract: Meta-Reinforcement Learning (Meta-RL) focuses on training policies using data collected from a variety of diverse environments. This approach enables the policy to adapt to new settings with only a few training steps. While many Meta-RL methods have demonstrated success, they often rely on the assumption that unobserved confounders can be excluded \emph{a priori}. This paper investigates robust Meta-RL in sequential decision-making, given confounded observational data collected across multiple heterogeneous environments. We introduce a novel augmentation procedure, called Causal MAML, which employs partial identification methods to generate posterior counterfactual trajectories from candidate environments that align with the confounded observations. These counterfactual trajectories are then used to find a policy initialization that produces strong generalization performance in the target domain. Theoretical analysis reveals that our causal Meta-RL approach is guaranteed to yield a solution that minimizes generalization loss.

Primary Area: causal reasoning

Submission Number: 12557

Loading