A Theoretical Study of the Jacobian Matrix in Deep Neural Networks

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Theory;Deep Neural Networks; Jacobian Matrix
TL;DR: A Theoretical Analysis of the Jacobian Matrix in DNNs beyond Initialization
Abstract: Due to the compositional nature of neural networks, increasing their depth can lead to issues of vanishing or exploding gradients if the initialization scheme is not carefully selected (Poole et al., 2016; Schoenholz et al., 2017; Hayou et al., 2019). One approach to identifying a desirable initialization scheme involves analyzing the behavior of the input-output Jacobian and ensuring that it does not degenerate exponentially with depth. Such an analysis has been conducted in previous works, such as Pennington et al. (2017), where the authors discovered a critical initializa- tion scheme that ensures Jacobian stability, as confirmed by empirical results. The analysis carried in such studies is limited to initialization and leverages classical results in random matrix theory. In this paper, we extend this analysis beyond initialization, and study Jacobian behaviour during training. Notably, we show that a notion of stability holds throughout training (if satisfied at initialization), hence providing a theoretical explanation for the crucial role of initialization. To do this, we first prove a general theorem that utilizes recent breakthrough results in random matrix theory (Brailovskaya and van Handel, 2022). To show the broad applicability of our framework, we also provide an analysis of the Jacobian in other scenarios such as sparse Networks and non-iid initialization.
Supplementary Material: zip
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8409
Loading