Approximation , Estimation and Optimization Errors for a Deep Neural Network

Approximation , Estimation and Optimization Errors for a Deep Neural Network

TMLR Paper2741 Authors

24 May 2024 (modified: 31 Oct 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The error of supervised learning is typically split into three components: Approximation, estimation and optimization errors. While all three have been extensively studied in the literature, a unified treatment is less frequent, in part because of conflicting assumptions: Approximation results typically rely on carefully hand crafted weights, which are difficult to achieve by gradient descent. Optimization theory is best understood in over-parametrized regimes with more weights than samples, while classical estimation errors typically require the opposite regime with more samples that weights. This paper contains two results which bound all three error components simultaneously for deep fully connected networks. The first uses a regular least squares loss and shows convergence in the under-parametrized regime. The second uses a kernel based loss function and shows convergence in both under and over-parametrized regimes.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: Martha White

Submission Number: 2741

Loading