Keywords: distribution shift, subpopulation shift, spurious correlation, influence function, sample reweighting, data selection
TL;DR: We introduce Group-robust Sample Reweighting (GSR), which uses group-labeled data to guide the iterative retraining of the model its on group-unlabeled data reweighted using influence functions.
Abstract: Machine learning models often have uneven performance among subpopulations
(a.k.a., groups) in the data distributions. This poses a significant challenge for the
models to generalize when the proportions of the groups shift during deployment.
To improve robustness to such shifts, existing approaches have developed strategies
that train models or perform hyperparameter tuning using the group-labeled data
to minimize the worst-case loss over groups. However, a non-trivial amount of
high-quality labels is often required to obtain noticeable improvements. Given
the costliness of the labels, we propose to adopt a different paradigm to enhance
group label efficiency: utilizing the group-labeled data as a target set to optimize
the weights of other group-unlabeled data. We introduce Group-robust Sample
Reweighting (GSR), a two-stage approach that first learns the representations from
group-unlabeled data, and then tinkers the model by iteratively retraining its last
layer on the reweighted data using influence functions. Our GSR is theoretically
sound, practically lightweight, and effective in improving the robustness to sub-
population shifts. In particular, GSR outperforms the previous state-of-the-art
approaches that require the same amount or even more group labels. Our code is
available at https://github.com/qiaoruiyt/GSR.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13637
Loading