Abstract: The error of supervised learning is typically split into three components: Approximation, estimation and optimization errors. While all three have been extensively studied in the literature, a unified treatment is less frequent, in part because of conflicting assumptions: Approximation results typically rely on carefully hand crafted weights, which are difficult to achieve by gradient descent. Optimization theory is best understood in over-parametrized regimes with more weights than samples, while classical estimation errors typically require the opposite regime with more samples that weights.
This paper contains two results which bound all three error components simultaneously for deep fully connected networks. The first uses a regular least squares loss and shows convergence in the under-parametrized regime. The second uses a kernel based loss function and shows convergence in both under and over-parametrized regimes.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Martha_White1
Submission Number: 2741
Loading