Keywords: Model generalization, empirical risk minimization, classification
TL;DR: A frustratingly easy trick to improve deep model generalization on classification tasks by just enlarging the classification space with theoretical and experimental support.
Abstract: Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRMis extremely simple to implement: just enlarging the dimension of the output logits and then optimising using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adversarial training, and long-tailed recognition.
Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free-lunch manner.
The goal of DuRM is not achieving state-of-the-art performance, but triggering new interest in the fundamental research on risk minimization.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4297
Loading