Policy Generation from Latent Embeddings for Reinforcement Learning

Corentin Artaud, Rafael Pina, Xiyu Shi, Varuna De Silva

Published: 2023, Last Modified: 13 May 2025ISPR (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The human brain endows us with extraordinary capabilities that enable us to create, imagine, and generate anything we desire. Specifically, we have fascinating imaginative skills allowing us to generate fundamental knowledge from abstract concepts. Motivated by these traits, numerous areas of machine learning, notably unsupervised learning and reinforcement learning, have started using such ideas at their core. Nevertheless, these methods do not come without fault. A fundamental issue with reinforcement learning especially now when used with neural networks as function approximators is their limited achievable optimality compared to its uses from tabula rasa. Due to the nature of learning with neural networks, the behaviours achievable for each task are inconsistent and providing a unified approach that enables such optimal policies to exist within a parameter space would facilitate both the learning procedure and the behaviour outcomes. Consequently, we are interested in discovering whether reinforcement learning can be facilitated with unsupervised learning methods in a manner to alleviate this downfall. This work aims to provide an analysis of the feasibility of using generative models to extract learnt reinforcement learning policies (i.e. model parameters) with the intention of conditionally sampling the learnt policy-latent space to generate new policies. We demonstrate that under the current proposed architecture, these models are able to recreate policies on simple tasks whereas fail on more complex ones. We therefore provide a critical analysis of these failures and discuss further improvements which would aid the proliferation of this work.