Worst-case Feature Risk Minimization for Data-Efficient Learning

Published: 26 Oct 2023, Last Modified: 26 Oct 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Deep learning models typically require massive amounts of annotated data to train a strong model for a task of interest. However, data annotation is time-consuming and costly. How to use labeled data from a related but distinct domain, or just a few samples to train a satisfactory model are thus important questions. To achieve this goal, models should resist overfitting to the specifics of the training data in order to generalize well to new data. This paper proposes a novel Worst-case Feature Risk Minimization (WFRM) method that helps improve model generalization. Specifically, we tackle a minimax optimization problem in feature space at each training iteration. Given the input features, we seek the feature perturbation that maximizes the current training loss and then minimizes the training loss of the worst-case features. By incorporating our WFRM during training, we significantly improve model generalization under distributional shift – Domain Generalization (DG) and in the low-data regime – Few-shot Learning (FSL). We theoretically analyze WFRM and find the key reason why it works better than ERM – it induces an empirical risk-based semi-adaptive $L_{2}$ regularization of the classifier weights, enabling a better risk-complexity trade-off. We evaluate WFRM on two data-efficient learning tasks, including three standard DG benchmarks of PACS, VLCS, OfficeHome and the most challenging FSL benchmark Meta-Dataset. Despite the simplicity, our method consistently improves various DG and FSL methods, leading to the new state-of-the-art performances in all settings. Codes & models will be released at https://github.com/jslei/WFRM.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Minor Revision: 1. We have enriched the discussion on existing methods for Few-Shot Learning in the Introduction. 2. We have included the quantitative measurements of inverse visualizations in A.6
Assigned Action Editor: ~Hanwang_Zhang3
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1198