Per-Group Distributionally Robust Optimization (Per-GDRO) with Learnable Ambiguity Set Sizes via Bilevel Optimization

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Group Distributionally Robust Optimization, phi-divergence, Wassertein distance, derivative-free optimization
TL;DR: We propose Per-GDRO, a bilevel optimization framework that adaptively learns group-specific robustness levels to ensure fairness and resilience against both inter-group and within-group distributional shifts.
Abstract: Group structures frequently influence model behavior; however, group membership is often unobserved during inference, limiting explicit control over group-specific performance. This can result in models performing well on certain groups but underperforming in others, leading to concerns about fairness. Moreover, each group may follow a different distribution, and a subset of groups may be more susceptible to distributional shifts due to external factors such as policy changes or environmental variation. To address these challenges, we propose a Per-Group Distributionally Robust Optimization (Per-GDRO) framework. It ensures fairness across groups and robustness to group-specific distributional shifts. In this framework, a $\phi$-divergence ambiguity set governs adversarial group reweighting, and Wasserstein ambiguity sets capture local uncertainty within each group. We then develop an iterative algorithm that alternates between model updates and adversarial distributions across and within groups. We employ a derivative-free surrogate optimization method to determine the size of these ambiguity sets in an adaptive manner.
Submission Number: 163
Loading