Multistep Inverse Is Not All You Need

Published: 15 May 2024, Last Modified: 14 Nov 2024RLC 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: representation learning, model-based, multi-step inverse, controllable, endogenous
TL;DR: We propose a new method for learning endogenous latent state representations, by identifying types of environments where the multi-step inverse method proposed by Lamb et al. (2022) fails and designing our method to succeed in these cases.
Abstract: In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the *controllable* dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a *multistep-inverse* method, in that it uses the encoding of the the first and last state in a path to predict the *first* action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.
Submission Number: 117
Loading