Width Provably Matters in Optimization for Deep Linear Neural NetworksDownload PDFOpen Website

2019 (modified: 11 Nov 2022)ICML 2019Readers: Everyone
Abstract: We prove that for an $L$-layer fully-connected linear neural network, if the width of every hidden layer is $\widetilde{\Omega}\left(L \cdot r \cdot d_{out} \cdot \kappa^3 \right)$, where $r$ and $...
0 Replies

Loading