Interpretable (Un)Controllable Features in MDP's

Jacob Eeuwe Kooi; Mark Hoogendoorn; Vincent Francois-Lavet

Interpretable (Un)Controllable Features in MDP's

Jacob Eeuwe Kooi, Mark Hoogendoorn, Vincent Francois-Lavet

Published: 20 Jul 2023, Last Modified: 30 Aug 2023EWRL16Readers: Everyone

Keywords: Representation Learning, Interpretability, Reinforcement Learning

TL;DR: Disentanglement of the latent representation of MDP's into a controllable and an uncontrollable partition.

Abstract: In the context of MDPs with high-dimensional states, downstream tasks are predominantly applied on a compressed, low-dimensional representation of the original input space.~A variety of learning objectives have therefore been used to attain useful representations. However, these representations usually lack interpretability of the different features. We present a novel approach that is able to disentangle latent features into a controllable and an uncontrollable partition. We illustrate that the resulting partitioned representations are easily interpretable on three types of environments and show that, in a distribution of procedurally generated maze environments, it is feasible to employ a planning algorithm in the isolated controllable latent partition while still improving performance.

1 Reply

Loading