Abstract: Label smoothing regularization (LSR) is a prevalent component for training deep neural networks and can improve the generalization of models effectively. Although it achieves empirical success, the theoretical understanding about the power of label smoothing, especially about its influence on optimization, is still limited. In this work, we, for the first time, theoretically analyze the convergence behaviors of stochastic gradient descent with label smoothing in deep learning. Our analysis indicates that an appropriate LSR can speed up the convergence by reducing the variance in gradient, which provides a theoretical interpretation on the effectiveness of LSR. Besides, the analysis implies that LSR may slow down the convergence at the end of optimization. Therefore, a novel algorithm, namely Two-Stage LAbel smoothing (TSLA), is proposed to further improve the convergence. With the extensive analysis and experiments on benchmark data sets, the effectiveness of TSLA is verified both theoretically and empirically.
10 Replies
Loading