Keywords: ML fairness, online learning, Reinforcement Learning
TL;DR: A new mathematical framework for optimizing diverse, conflicting objectives (such as fairness metrics) via user feedback
Abstract: Large-scale deployed learning systems are often evaluated along
multiple objectives or criteria. But, how can we learn or optimize
such complex systems, with potentially conflicting or even
incompatible objectives? How can we improve the system when user feedback becomes available, feedback possibly alerting to issues not previously optimized for by the system?
We present a new theoretical model for learning and optimizing such
complex systems. Rather than committing to a static or pre-defined
tradeoff for the multiple objectives, our model is guided by the
feedback received, which is used to update its internal state.
Our model supports multiple objectives that can be of very general
form and takes into account their potential incompatibilities.
We consider both a stochastic and an adversarial setting. In the
stochastic setting, we show that our framework can be naturally cast
as a Markov Decision Process with stochastic losses, for which we give
efficient vanishing regret algorithmic solutions. In the adversarial
setting, we design efficient algorithms with competitive ratio
guarantees.
We also report the results of experiments with our stochastic
algorithms validating their effectiveness.
3 Replies
Loading