Data, Auxiliary Losses, or Normalization Layers for Plasticity? A case study with PPO on Atari

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: plasticity loss, ppo, atari, regularization
Abstract: We compare the impact of data, auxiliary losses, and normalization layers (forming input, output, and architecture perspectives, respectively) on mitigating plasticity loss in deep reinforcement learning through a case study with Proximal Policy Optimization (PPO) on the ALE benchmark (Atari), a widely used on-policy algorithm and benchmark suite for vision-based discrete control tasks. Although many interventions have been proposed to address the inability of a deep network to continue learning due to plasticity loss, no single solution has emerged. We find that neither richer input information nor reduced gradient noise from larger batch sizes prevents collapse. Additionally, we categorize auxiliary loss interventions based on the component being regularized and the target of the regularization. Thanks to this taxonomy, we identify unexplored solutions in the current literature and, as an illustration, derive an unstudied intervention: CHAIN-SP. We find that the best performance and training stability among the loss interventions that require tuning is achieved with churn-reduction auxiliary losses. Finally, we find that LayerNorm is best at mitigating plasticity loss among the normalization layers.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Daniil_Pyatko1
Track: Regular Track: unpublished work
Submission Number: 118
Loading