A $p$-adic Perspective on Low-Bit Training of Neural Networks

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: finite-precision learning dynamics, low-bit optimization, quantized neural network training, high-dimensional optimization, loss landscape geometry, p-adic methods
TL;DR: Recasting low-bit neural network training as p-adic root finding and solving it via Hensel lifting instead of gradient descent.
Abstract: We investigate low-bit neural network training from a $p$-adic perspective. In extreme low-bit regimes, the set of representable values is so small that gradient descent operates on an essentially discrete domain, making continuous analysis inadequate. This observation motivates us to model a neural network as a polynomial system over the integers modulo $p^N$. Activations and losses are replaced by piecewise polynomial approximations, and training is recast as finding roots of the resulting system. Hensel's lemma provides an iterative procedure for lifting seed roots digit-by-digit to the required precision. We formalize this approach and demonstrate its feasibility on linear regression and shallow polynomial networks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 83
Loading