Keywords: Cautious Optimizer, Optimization, Riemannian Optimization, Sphere
Abstract: Recent studies have shown that optimizing parameters with spherical constraints or scale-invariance can effectively improve model performance. This paper proposes the Spherical Cautious Optimizers. Standard cautious optimizers prevent overshooting by element-wise masking of updates with inconsistent signs. However, directly applying them to spherical or scale-invariant parameters allows radial noise to severely interfere with sign judgment, introducing geometric distortion and disrupting convergence. The Spherical Cautious Optimizers determine masks solely based on the sign consistency of updates and gradients in the tangent space, ensuring decisions are guided by the true feature learning directions, and employs retraction to align the optimization trajectory with the manifold geometry. Both theoretical analysis and experimental results show that the Spherical Cautious Optimizers guarantees monotonic descent while significantly improving convergence speed and accuracy in vision and language models. The method is highly general, requiring only a single line of code modification, and is applicable to general manifold constraints.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Submission Number: 3
Loading