Batch Normalization Is Blind to the First and Second Derivatives of the Loss w.r.t. Features

Zhanpeng Zhou; Wen Shen; Huixin Chen; Ling Tang; YueFeng Chen; Quanshi Zhang

Batch Normalization Is Blind to the First and Second Derivatives of the Loss w.r.t. Features

Zhanpeng Zhou, Wen Shen, Huixin Chen, Ling Tang, YueFeng Chen, Quanshi Zhang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Batch Normalization, Deep Learning Theory, Neural Networks

TL;DR: When we do the Taylor series expansion of the loss function w.r.t. the output of the BN operation, we prove that the BN operation will block the back-propagation of the first and second derivatives of the loss function.

Abstract: We prove that when we do the Taylor series expansion of the loss function, the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. This is a potential defect of the BN operation. We also find that such a problem is caused by the standardization phase of the BN operation. We believe that the proof of the blindness of a deep model is of significant value to avoiding systemic collapses of a deep model, although such a blindness does not always makes significant damages in all applications. Experiments show that the BN operation significantly affects feature representations in specific tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

8 Replies

Loading