Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

Alan Milligan; Zikun Xu; Simon Lacoste-Julien; Felix Dangel; Wu Lin

Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

Alan Milligan, Zikun Xu, Simon Lacoste-Julien, Felix Dangel, Wu Lin

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: optimization, deep learning, reparameterization, robustness

TL;DR: We show that a mathematically equivalent reparameterization of Shampoo and SOAP -style optimizers is more robust to lower numerical precision and can use efficient subspace updates to their preconditions.

Abstract: Shampoo-based methods, such as KL-Shampoo and SOAP, have demonstrated strong performance in training neural networks and leverage QR decompositions. As existing QR implementations require single-precision arithmetic and remain computationally expensive, these methods become time- and memory-intensive when their preconditioning matrices are large. Moreover, using half-precision (BFP16) storage to reduce memory can degrade the performance of Shampoo-based methods. We propose a reparametrization of the preconditioner that supports half-precision storage, and also enables efficient QR-based updates in subspaces while retaining single-precision arithmetic and thereby reducing both computational cost and memory overhead. It applies broadly to Shampoo-based methods that employ QR decomposition, including KL-Shampoo and SOAP. Our approach mitigates the performance degradation of these methods under half-precision storage and, overall, makes them more memory- and time-efficient.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 138

Loading