Keywords: Offline Reinforcement Learning, Latent Action Space, Representa- tion Learning, Robotic Manipulation
Abstract: The performance and computational efficiency of latent-space offline Reinforcement Learning (RL) methods critically depend on the quality of learned representations. In approaches such as Policy in Latent Action Space (PLAS), this representation is obtained via a Conditional Variational Autoencoder (CVAE) trained during a fixed warmup phase prior to policy optimization. We propose PLASDecoupledCVAE, a modular extension that explicitly decouples CVAE pre-training from policy learning. Our framework exposes the CVAE warmup as an independent, configurable stage equipped with adaptive convergence detection based on reconstruction-loss plateaus. This design enables a train-once, reuse-many paradigm, allowing a learned latent action space to be reused across multiple policy optimization runs and substantially reducing computational overhead. From an Automated Reinforcement Learning (AutoRL) perspective, we argue that the CVAE pre-training schedule should be treated as a tunable, data-dependent component rather than a fixed-step hyperparameter. We empirically evaluate fixed and adaptive warmup strategies on the Fetch robotic manipulation benchmark across multiple convergence thresholds. Our results show that adaptive warmup can yield performance gains of up to $+21.92\%$ over the standard PLAS baseline, although optimal configurations are task-dependent. Notably, we find that lower CVAE reconstruction loss does not guarantee better policy performance, highlighting the non-trivial relationship between generative model convergence and downstream task success. We release our implementation as a community package. Code available at https://github.com/somenomus/PLASDecoupledCVAE
Journal Edition Interest: Yes
Submission Number: 9
Loading