Out of Distribution Adaptation in Offline RL via Causal Normalizing Flows

Published: 29 Nov 2025, Last Modified: 28 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Despite the success of reinforcement learning (RL), the common assumption of online interaction prevents its widespread adoption. Offline RL has emerged as an alternative that learns a policy from precollected data. However, this learning paradigm introduces a new challenge called “distributional shift”, degrading the performance of the policy when evaluated on out-of-distribution (OOD) scenarios (i.e., outside of the training data). Most existing works resolve this by policy regularization to optimize a policy within the support of the data. However, this overlooks the potential for high-reward regions outside of the data. This motivates offline policy optimization that is capable of finding high-reward regions outside of the data. In this paper, we devise a causality-based model architecture to accurately capture the OOD scenarios wherein the policy can be optimized without performance degradation. Specifically, we adapt causal normalizing flows (CNFs) to learn the transition dynamics and reward function for data generation and augmentation in offline policy learning. Based on the physics-based qualitative causal graph and precollected data, we develop a model-based offline OOD-adapting causal RL (MOOD-CRL) algorithm to learn the quantitative structural causal model. Consequently, MOOD-CRL can exercise counterfactual reasoning for sequential decision-making, revealing a high potential for OOD adaptation. The effectiveness is validated through extensive empirical evaluations with ablations including data quality and algorithmic sensitivity. Our results show that MOOD-CRL achieves comparable results with its online counterparts and consistently outperforms state-of-the-art model-free and model-based baselines by a significant margin.
Loading