Abstract: Understanding learning with deep architectures has been a major research objective in the recent years with notable theoretical progress. A main focal point of those studies stems from the success of excessively large networks. We study empirically the layer-wise functional structure of overparameterized deep models. We provide evidence for the heterogeneous characteristic of layers. To do so, we introduce the notion of (post training) re-initialization and re-randomization robustness. We show that layers can be categorized into either ``robust'' or ``critical''. In contrast to critical layers, resetting the robust layers to their initial value has no negative consequence, and in many cases they barely change throughout training. Our study provides evidence flatness or robustness analysis of the model parameters needs to respect the network architectures.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:1902.01996/code)