Abstract: Understanding learning with deep architectures has been a
major research objective in the recent years with notable theoretical
progress. A main focal point of those studies stems from the success
of excessively large networks. We study empirically the layer-wise functional
structure of overparameterized deep models. We provide evidence for the
heterogeneous characteristic of layers. To do so, we introduce the notion of
(post training) re-initialization and re-randomization robustness. We show
that layers can be categorized into either ``robust'' or
``critical''. In contrast to critical layers, resetting the robust layers to their
initial value has no negative consequence, and in many cases they barely change
throughout training. Our study provides evidence
flatness or robustness analysis of the model parameters needs to respect the
network architectures.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/are-all-layers-created-equal/code)
1 Reply
Loading