ScaleAC: Scale Actor-Critic by Replay Ratio

16 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning; replay ratio; dormant neuron; sample efficiencty
Abstract: Employing a high replay ratio, defined as the number of updates of an agent's network parameters per environment interaction, has recently become a promising strategy to improve sample efficiency in reinforcement learning (RL). However, most existing efforts to effectively scale a replay ratio stagnate at small values, leaving the potential of scaling a replay ratio to hundreds underexplored. In this paper, we aim to break the bottleneck of replay ratio scaling to achieve sample-efficient RL. We start from the critical pathology that simply increasing the replay ratio leads to severe dormant neurons in the critic network of actor-critic (AC), which fundamentally undermines the learning process. To address this problem, we propose a novel method called ScaleAC, which is built upon advanced AC algorithms (e.g., REDQ, DrQ-v2). First, ScaleAC introduces a periodic soft network parameter reset to reduce dormant neurons when updating the critic at a high frequency. Second, ScaleAC diversifies the replay experience through two kinds of data augmentation to prevent overfitting. Experiments across diverse MuJoCo and DMC tasks demonstrate that ScaleAC successfully achieves effective RL training at high replay ratios of up to 256 in vector-based RL and 8 in visual pixel-based RL, yielding substantial learning acceleration and performance improvement.
Primary Area: reinforcement learning
Submission Number: 7459
Loading