Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma; Lu Li; Zilin Wang; Li Shen; Pierre-Luc Bacon; Dacheng Tao

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Integrating network sparsity into the most advanced architectures can further unlock the scaling potential of DRL models and meanwhile effectively mitigating optimization pathologies during scaling.

Abstract: Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

Lay Summary: When we use large networks for deep reinforcement learning, training often breaks down: the models crash, forget early skills, or get stuck. Researchers usually fight these problems with complex fixes such as special layers and frequent "resets". We offer a simpler idea: before training, randomly cut network's connections and never add them back. This one-shot random pruning makes a static sparse network that, even with fewer weights, learns faster and scores higher than the dense model. We also demonstrate that the static sparse network prevents common failures, such as plasticity loss and gradient interference.

Link To Code: https://github.com/lilucse/SparseNetwork4DRL

Primary Area: Reinforcement Learning->Deep RL

Keywords: Deep Reinforcement Learning, Network Sparsity, Scaling, Plasticity Loss, Regularization

Submission Number: 3388

Loading