Abstract: While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of weight expansion, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an indicator of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Section 1 Introduction: we have re-written some paragraphs in Introduction to present the supporting evidence of two hypotheses. 2. Section 3 Dropout and weight expansion: we added the assumptions into the lemmas to make them more explicit, in addition, we re-wrote some paragraphs with one-step and multi-step analysis. 3. Section 5 Experiments: (1) we show the consumed time of estimating weight volume; (2) we rectified the statement of measurement for complexity measures; (3) we show how to estimate sharpness PAC-Baye sigma; (4) we add the limitations of our methods. 4. Section 6 Discussion: we have re-organized this section as 4 parts: 6.1 Related work about dropout; 6.2 Related work about other generalization factors; 6.3 Related work about PAC-Bayes; 6.4 Weight expansion and flatness. In particular, we clarified the connection between weight volume and flatness in Section 6.4. 5. Section 7 Conclusion and future work: we re-organized the conclusion in a logical way and made the statements more clearly. 6. Appendix M.2: we show the complexity measure experiments with the average sign-error measurement from Dziugaite et al. (2020b). 7. Appendix M.3: we show the true distribution of some weights with sampling method.
Assigned Action Editor: ~Yaoliang_Yu1