Keywords: Initialization, Idetity Matrix, Dynamical Isometry
TL;DR: A simple and general method for stable training
Abstract: A well-conditioned initialization is beneficial for training deep neural networks. However, existing initialization approaches do not simultaneously show robustness and universality. Specifically, even though the widely-used Xavier and Kaiming initialization approaches can generally fit a variety of networks, they fail to train residual networks without Batch Normalization for calculating an inappropriate scale on data-flow. On the other hand, some literature design stable initialization (e.g., Fixup and ReZero) based on dynamical isometry, an efficient learning mechanism. Nonetheless, these methods are specifically designed for either a non-residual structure or a residual block only, and even include extra auxiliary components, limiting their applicable range. Intriguingly, we find that the identity matrix is a feasible and universal solution to the aforementioned problems, as it adheres to dynamical isometry while remaining applicable to a wide range of models. Motivated by this, we develop Identical Initialization (IDInit), a sufficiently robust, universal, and fast-converging approach on the identity matrix. Empirical results on a variety of benchmarks show that IDInit is universal to various network types, and practically useful with good performance and fast convergence.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
16 Replies
Loading