Structural and Compact Latent Representation Learning on Sparse Reward Environments

Published: 01 Jan 2023, Last Modified: 04 Apr 2025ACIIDS (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: For the task of training RL agent in a sparse-reward, image-based observation environment, the agent should perfect both learning latent representation and having a good-exploration strategy. Standard approaches such as variational auto-encoder (VAE) could learn such representation. However, these approaches are only designed to encode the input observations into a pre-defined latent distribution and do not take into account the dynamics of the environment. To improve the training process from high-dimensional input images, we extend the standard VAE framework to learn a compact latent representation that can mimic the structures of the underlying Markov decision process. We further add an intrinsic reward based on the learned latent to encourage exploratory actions in the sparse reward environments. The intrinsic reward is designed to direct the policy to visit distant states in the latent space. Experiments on several gridworld environments with sparse rewards are carried out to demonstrate the effectiveness of our proposed approach. Compared to other baselines, our method has more stable performance and better exploration coverage by exploiting the learned latent structure property.
Loading