Keywords: Sobolev training, Gradient flow, Convergence acceleration, ReLU networks
TL;DR: We show that Sobolev training provably accelerates the convergence of Rectified Linear Unit (ReLU) networks and quantify such 'Sobolev acceleration'.
Abstract: $\textit{Sobolev training}$, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain incompletely understood. In this work, we show that Sobolev training provably accelerates the convergence of Rectified Linear Unit (ReLU) networks and quantify such `Sobolev acceleration' within the student--teacher framework. Our analysis builds on an analytical formula for the population gradients and Hessians of ReLU networks under centered spherical Gaussian input. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks, including diffusion models.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 23675
Loading