Keywords: fairness, fair representation learning
TL;DR: We present the first provable fair representation learning method.
Abstract: Fair representation learning (FRL) is a popular class of methods that can replace the original dataset with a debiased synthetic one, which is then to be used to train fair classifiers. However, recent work has shown that prior methods achieve worse accuracy-fairness tradeoffs than originally suggested, dictating the need for FRL methods that provide provable bounds on unfairness of any downstream classifier, a challenge yet unsolved. In this work we address this challenge and propose Fairness with Restricted Encoders (FARE), the first FRL method with provable fairness guarantees. Our key insight is that restricting the representation space of the encoder enables us to derive fairness guarantees, while allowing empirical accuracy-fairness tradeoffs comparable to prior work. FARE instantiates this idea with a tree-based encoder, a choice motivated by advantages of decision trees when applied in our setting. Crucially, we develop and apply a practical statistical procedure that computes a high-confidence upper bound on the unfairness of any downstream classifier. In our experimental evaluation on several datasets we demonstrate that FARE produces tight upper bounds, often comparable with empirical results of prior methods, establishing the practical value of our approach.