Policy-Aware Learning of Transition Models Using a Causal Approach

Maksim Anisimov; Edwin Hamel-de le Court; Francesco Belardinelli

Policy-Aware Learning of Transition Models Using a Causal Approach

Maksim Anisimov, Edwin Hamel-de le Court, Francesco Belardinelli

Published: 12 Dec 2024, Last Modified: 06 Mar 2025AAAI 2025 Workshop AICT OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Causality, Reinforcement Learning, Safe AI, Interventions

TL;DR: Transition model learning procedure which emphasises data observations that resemble future data using a causal method

Abstract: Predicting what will happen when a reinforcement learning (RL) agent is deployed to the real world is important to provide safety guarantees about its behaviour. In some cases, the agent’s training experience can be significantly different from the deployment experience. Thereby, learning a transition model with the unadjusted training data can lead to poor performance when predicting the agent’s behaviour under the optimal policy. To mitigate this issue, we propose a policy matching (PM) algorithm based on the causal Bayesian network factorisation. It adjusts the transition model learning by taking into account the difference between agent’s interventions at training and under the optimal policy. Experiments in popular RL environments demonstrate that the PM method improves the transition model performance at deployment when the model misgeneralisation problem is otherwise severe.

Submission Number: 16

Loading