- Abstract: Over-parameterized networks are widely believed to have nice landscape, but what rigorous results can we prove? In this work, we prove that: (i) from under-parameterized to over-parameterized networks, there is a phase transition from having sub-optimal basins to no sub-optimal basins; (ii) over-parameterization alone cannot eliminate bad non-strict local minima. Specifically, we prove that for any continuous activation functions, the loss surface of a class of over-parameterized networks has no sub-optimal basin, where “basin” is defined as the setwise strict local minimum. Furthermore, for under-parameterized network, we construct loss landscape with strict local minimum that is not global. We then show that it is impossible to prove “all over-parameterized networks have no sub-optimal local minima”, by giving counter-examples for 1-hidden-layer networks with a class of neurons. Viewing various bad patterns of landscape as illnesses (bad basins, flat regions, etc.), our results indicate that over-parameterization is not a panacea for every “illness” of the landscape, but it can cure one practically annoying illness (bad basins).
- Original Pdf: pdf