Keywords: World Models, Physically Interpretable Representation Learning, Autoencoders
Abstract: Deep learning models are increasingly employed for perception, prediction, and control in autonomous systems. For achieving realistic and consistent outputs, it is crucial to embed physical knowledge into their learned representations.
However, doing so is difficult due to high-dimensional observation data, such as images, particularly under conditions of incomplete system knowledge and imprecise state sensing. To address this, we propose Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. To this end, our architecture combines a physical interpretable image autoencoding model and a partially known learnable dynamical model. We conduct an in-depth analysis of the latent space, evaluating the effects of continuous versus discrete representations, as well as intrinsic versus extrinsic physical interpretable encodings. The training incorporates weak distributional supervision to eliminate the impractical reliance on ground-truth physical knowledge. Through three case studies, we demonstrate that our approach not only provides physical interpretability but also achieves state prediction accuracy superior to state-of-the-art models, thus advancing interpretable representation learning.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 6319
Loading