TL;DR: This work studies online learning and anytime-valid inference for general social welfare objectives under partial feedback.
Abstract: In many real-world settings, a centralized decision-maker must repeatedly allocate finite resources to a population over multiple time steps. Individuals who receive a resource derive some stochastic utility; to characterize the population-level effects of an allocation, the expected individual utilities are then aggregated using a social welfare function (SWF). We formalize this setting and present a general confidence sequence framework for SWF-based online learning and inference, valid for any monotonic, concave, and Lipschitz-continuous SWF. Our key insight is that monotonicity alone suffices to lift confidence sequences from individual utilities to anytime-valid bounds on optimal welfare. Building on this foundation, we propose SWF-UCB, a SWF-agnostic online learning algorithm that achieves near-optimal $\tilde{\mathcal{O}}(n+\sqrt{nkT})$ regret (for $k$ resources distributed among $n$ individuals at each of $T$ time steps). We instantiate our framework on three normatively distinct SWF families: Weighted Power Mean, Kolm, and Gini, providing bespoke oracle algorithms for each. Experiments confirm $\sqrt{T}$ scaling and reveal rich interactions between $k$ and SWF parameters. This framework naturally supports inference applications such as sequential hypothesis testing, optimal stopping, and policy evaluation.
Lay Summary: In many real-world settings, a centralized decision-maker must repeatedly allocate finite resources to a population over multiple time steps. Individuals in the population receive some random utility upon receiving the resource. A natural goal for the decision-maker is to learn/infer a randomized allocation to maximize an aggregate of the expected individual utilities, with these aggregations specified by social welfare functions. In this work, we develop a statistical framework to enable online learning and inference of these optimal allocations. Along with providing the general framework, we consider three popular families of social welfare functions and provide exact methods for them. We also conduct synthetic experiments on the online learning setup to study variations in outcomes with changing problem parameters.
Originally Submitted Supplementary Material: zip
Primary Area: Theory->Online Learning and Bandits
Keywords: Resource allocation, social welfare functions, fairness, online learning and inference
Originally Submitted PDF: pdf
Submission Number: 29237
Loading