Abstract: Highlights•The method stabilizes reinforcement learning by reducing intermittent gradient spikes.•Intermittent gradient spikes are controlled using adaptive learning rate adjustments.•The method preserves initial gradient norms, aiding stable value learning.•Experiments demonstrate improved stability and performance in reinforcement learning.
Loading