Keywords: Deep Learning, Machine Learning, Weight Initialization
Abstract: Random initialization of deep feedforward networks can cause vanishing and exploding gradients. As the network depth increases, adapting the weights' standard deviation at initialization mitigates but does not solve this issue. This problem has led to the introduction of several architectural modifications, notably residual connections and normalization layers. In this work, we return to the original problem of poor statistical signal propagation in MLPs and propose an alternative that stabilizes both the forward and backward passes at arbitrary depths. Our approach is similar to orthogonal initialization, yet it is cheaper to implement and based on maximum length sequences (M-seq): pseudo-random binary sequences generated by a linear-feedback shift register
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 203
Loading