Abstract: The error of supervised learning is typically split into three components: approximation, estimation and optimization errors. While all three have been extensively studied in the literature, a unified treatment is less frequent, in part because of conflicting assumptions. Current approximation results rely on carefully hand crafted weights or practically unavailable information, which are difficult to achieve by gradient descent. Optimization theory is best understood in over-parametrized regimes with more weights than samples, while classical estimation errors require the opposite regime with more samples than weights.
This paper contains two results which bound all three error components simultaneously for (non-convex) training of the second but last layer of deep fully connected networks on the unit sphere. The first uses a regular least squares loss and shows convergence in the under-parametrized regime. The second uses a kernel based loss function and shows convergence in both under and over-parametrized regimes.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=nnTKcGNrbV
Changes Since Last Submission: We included the requests of the editor as follows:
1. Added "... for (non-convex) training of the second but last layer of deep fully connected networks on the unit sphere." to the abstract.
2. Added "For simplicity, we confine the data to uniform samples on the unit sphere and train only the non-convex second but last layer." at the beginning of Section 1.2 before "Less Over-Parametrization"
3. Added Section 2.6 with a discussion of the two assumptions as requested by the editor. This is slightly extended from the review replies including references to the proof sketch.
Code: https://github.com/gtwe/nnaos
Assigned Action Editor: ~Martha_White1
Submission Number: 3868
Loading