## ON BREIMAN’S DILEMMA IN NEURAL NETWORKS: SUCCESS AND FAILURE OF NORMALIZED MARGINS

27 Sep 2018 (modified: 21 Dec 2018)ICLR 2019 Conference Blind SubmissionReaders: Everyone
• Abstract: A belief persists long in machine learning that enlargement of margins over training data accounts for the resistance of models to overfitting by increasing the robustness. Yet Breiman shows a dilemma (Breiman, 1999) that a uniform improvement on margin distribution \emph{does not} necessarily reduces generalization error. In this paper, we revisit Breiman's dilemma in deep neural networks with recently proposed normalized margins using Lipschitz constant bound by spectral norm products. With both simplified theory and extensive experiments, Breiman's dilemma is shown to rely on dynamics of normalized margin distributions, that reflects the trade-off between model expression power and data complexity. When the complexity of data is comparable to the model expression power in the sense that training and test data share similar phase transitions in normalized margin dynamics, two efficient ways are derived via classic margin-based generalization bounds to successfully predict the trend of generalization error. On the other hand, over-expressed models that exhibit uniform improvements on training normalized margins may lose such a prediction power and fail to prevent the overfitting.
• Keywords: Bregman's Dilemma, Generalization Error, Margin, Spectral normalization
• TL;DR: Bregman's dilemma is shown in deep learning that improvement of margins of over-parameterized models may result in overfitting, and dynamics of normalized margin distributions are proposed to predict generalization error and identify such a dilemma.
7 Replies