Keywords: Distribution shift, group shift, DRO, label noise
TL;DR: Assuming sub-populations are realized by simple functions leads to a constrained DRO that yields less pessimistic solutions.
Abstract: Although training machine learning models for robustness is critical for real-world adoption, determining how to best ensure robustness remains an open problem. Some methods (e.g., DRO) are overly conservative, while others (e.g., Group DRO) require domain knowledge that may be hard to obtain. In this work, we address limitations in prior approaches by assuming a more nuanced form of group shift: conditioned on the label, we assume that the true group function is simple. For example, we may expect that group shifts occur along high-level features (e.g., image background, lighting). Thus, we aim to learn a model that maintains high accuracy on simple group functions realized by these features, but need not spend valuable model capacity achieving high accuracy on contrived groups of examples. Based on this idea, we formulate a two-player game where conditioned on the label the adversary can only separate datapoints into potential groups using simple features, which corresponds to a bitrate constraint on the adversary's capacity. Our resulting practical algorithm, Bitrate-Constrained DRO (BR-DRO), does not require group information on training samples yet matches the performance of Group DRO. Our theoretical analysis reveals that in some settings BR-DRO objective can provably yield statistically efficient and less pessimistic solutions than unconstrained DRO.