Abstract: Out-of-distribution (OOD) generalization is critical for applying reinforcement learning algorithms to real-world applications. To address the OOD problem, recent works focus on learning an OOD adaptation policy by capturing the causal factors affecting the environmental dynamics. However, these works recover the causal factors with only an entangled or binary form, resulting in a limited generalization of the policy that requires extra data from the testing environments. To break this limitation, we propose generalizable causal reinforcement learning (GCRL) to learn a disentangled representation of causal factors, on the basis of which we learn a policy that achieves the OOD generalization without extra training. For capturing the causal factors, GCRL deploys a weakly supervised signal with a two-stage constraint to ensure that all factors can be disentangled. Then, to achieve the OOD generalization through causal factors, we establish the dependence of actions on the learned representation and optimize the policy model across multiple environments. Experimental results show that the established dependence recovers the correct relationship between causal factors and actions when the learned policy could address the target tasks in training environments. Benefiting from the recovered relationship, GCRL achieves the OOD generalization on eight benchmarks from Causal World and Mujoco. Moreover, the policy learned by our model is more explainable and can be controlled to generate semantic actions by intervening in the representation of causal factors.
External IDs:dblp:journals/tii/HuangHCCPSY25
Loading