Keywords: Label Smoothing, Regularization, Representation Learning, Explainability
Abstract: Label Smoothing aims to prevent Neural Networks from making over-confident predictions and improve generalization.
Due to its effectiveness, it has become an indispensable ingredient of the training recipe for tasks such as Image Recognition and Neural Machine Translation. Despite that, previous work shows it encourages an overly tight cluster in the feature space, which `erases' the similarity information of individual examples, resulting in impaired representation learning. By isolating the loss induced by Label Smoothing into a combination of a regularization term and an error-enhancement term, we reveal a previously unknown defect, i.e., it indeed encourages classifiers to be over-confident, when they make incorrect predictions. To remedy this, we present a solution called Max Suppression (MaxSup), which consistently applies the intended regularization effect during training, independent of the correctness of prediction. By visualizing the learned features, we show that MaxSup successfully enlarges intra-class variations, while improving the inter-class separability. We further conduct experiments on Image Classification and Machine Translation tasks, validating the superiority of Max Suppression. The code implementation is available at [anonymous repository](https://anonymous.4open.science/r/Maximum-Suppression-Regularization-DB0C).
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3323
Loading