Adaptive Label Smoothing with Self-KnowledgeDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Regularization, Model Calibration, Adaptive Label Smoothing, Self-Knowledge Distillation, Overconfidence, Natural Language Generation
Abstract: Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a predefined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation. Our regularizer is validated comprehensively on various datasets in machine translation and outperforms strong baselines not only in model performance but also in model calibration by a large margin.
One-sentence Summary: We propose adaptive label smoothing with self-knowledge that enhances model performance and calibration.
Supplementary Material: zip
16 Replies

Loading