Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee; Youngdo Lee; Takuma Seno; Donghu Kim; Peter Stone; Jaegul Choo

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Normalizing weights and features on unit norm hypersphere allows scaling up parameters and computations in RL

Abstract: Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains.

Lay Summary: Scaling up models improves performance in supervised learning but often leads to instability in reinforcement learning due to non-stationary data. SimbaV2 addresses this by constraining the growth of parameter and gradient norms, stabilizing optimization. This enables effective scaling with larger models and compute, achieving state-of-the-art performance in reinforcement learning tasks.

Link To Code: https://github.com/dojeon-ai/SimbaV2

Primary Area: Reinforcement Learning->Deep RL

Keywords: reinforcement learning, normalization

Submission Number: 11664

Loading