Compute-Optimal Training as Stochastic Optimal Control

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: stochastic optimal control, Hamilton-Jacobi-Bellman PDE, learning rate scheduling, batch size adaptation, compute-optimal training, Deep Galerkin Method
TL;DR: We solve the Hamilton-Jacobi-Bellman equation for jointly optimal learning rate and batch size schedules, finding that batch size should grow monotonically and that constant schedules are near-optimal in noise-dominated regimes.
Abstract: Choosing learning rate and batch size schedules for neural network training is typically performed via expensive grid search over heuristic families. We formulate compute-optimal training as a stochastic optimal control problem, where the state comprises per-eigenmode loss coefficients from the continuous-time scaling model of Bordelon et al. (2024) and the remaining compute budget, while the controls are the learning rate $\eta(t)$ and batch size $B(t)$. The resulting Hamilton-Jacobi-Bellman (HJB) equation characterizes the globally optimal schedule. We solve this HJB equation using the Deep Galerkin Method with policy iteration, a mesh-free approach that handles the 12-dimensional PDE. On the random feature model, the HJB framework yields two insights: (1) in the noise-dominated regime, a well-tuned constant schedule is near-optimal—the HJB solution explains this by showing that the value function's gradient structure leaves little room for improvement; (2) the optimal batch size grows monotonically, derived from first principles via the optimality conditions rather than empirical observation (Smith et al., 2018). Joint learning-rate-and-batch-size control provides 2–6% gains over learning-rate-only optimization, with the largest benefit at hard tasks where noise management is most impactful.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 94
Loading