Batch Normalization Is Blind to the First and Second Derivatives of the Loss w.r.t. FeaturesDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Batch Normalization, Deep Learning Theory, Neural Networks
TL;DR: When we do the Taylor series expansion of the loss function w.r.t. the output of the BN operation, we prove that the BN operation will block the back-propagation of the first and second derivatives of the loss function.
Abstract: We prove that when we do the Taylor series expansion of the loss function, the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. This is a potential defect of the BN operation. We also find that such a problem is caused by the standardization phase of the BN operation. We believe that the proof of the blindness of a deep model is of significant value to avoiding systemic collapses of a deep model, although such a blindness does not always makes significant damages in all applications. Experiments show that the BN operation significantly affects feature representations in specific tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
8 Replies

Loading