Sobolev acceleration for neural networks

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Sobolev Training, Convergence Acceleration, Neural Networks, Gradient Flow
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Our study contributes to a deeper understanding of the dynamics of ReLU networks in the student-teacher setting and highlights the convergence acceleration achieved through Sobolev training, known as Sobolev acceleration.
Abstract: Sobolev training for neural networks, a technique that integrates target derivatives into the training process, has demonstrated significantly faster convergence towards lower test errors when compared to conventional loss functions. However, to date, the effect of this training has not been understood comprehensively. This paper presents analytical evidence that Sobolev training accelerates the convergence of rectified linear unit (ReLU)-networks in the student-teacher framework. The analysis builds upon the analytical formula for the population gradients of ReLU networks with centered spherical Gaussian input. Further, numerical examples were considered to show that the results may be extended to multi-layered neural networks with various activation functions and architectures. Finally, we propose the use of Chebyshev spectral differentiation as a solution to approximate target derivatives and address prior limitations on using approximated derivatives. Overall, this study contributes to a deeper understanding of the dynamics of ReLU networks in the student-teacher setting and highlights the convergence acceleration achieved through Sobolev training, known as Sobolev acceleration.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1589
Loading