Initializing ReLU networks in an expressive subspace of weightsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Signal propagation, deep ReLU networks, mean-field theory, improved initialization
Abstract: Using a mean-field theory of signal propagation, we analyze the evolution of correlations between two signals propagating forward through a deep ReLU network with correlated weights. Signals become highly correlated in deep ReLU networks with uncorrelated weights. We show that ReLU networks with anti-correlated weights can avoid this fate and have a chaotic phase where the signal correlations saturate below unity. Consistent with this analysis, we find that networks initialized with anti-correlated weights can train faster by taking advantage of the increased expressivity in the chaotic phase. An initialization scheme combining this with a previously proposed strategy of using an asymmetric initialization to reduce dead node probability shows consistently lower training times compared to various other initializations on synthetic and real-world datasets. Our study suggests that use of initial distributions with correlations in them can help in reducing training time.
One-sentence Summary: ReLU networks initialized with asymmetric anti-correlated weights learn faster.
6 Replies

Loading