Differentiable Optimal Adversaries for Learning Fair Representations

Anonymous

Differentiable Optimal Adversaries for Learning Fair Representations

Anonymous

21 Oct 2020 (modified: 05 May 2023)Submitted to LMCA2020Readers: Everyone

Keywords: deep learning, adversarial representation learning, fairness, optimization

TL;DR: We approach fair representation learning by deriving gradients through the training procedure of an unfairness-detecting logistic regression, deriving gradients of the fully-trained logistic regression model with respect to the input embedding.

Abstract: Fair representation learning is an important task in many real-world domains, with the goal of finding a performant model that obeys fairness requirements. We present an adversarial representation learning algorithm that learns an informative representation while not exposing sensitive features. Our goal is to train an embedding such that it has good performance on a target task while not exposing sensitive information as measured by the performance of an optimally trained adversary. Our approach directly trains the embedding with these dual objectives in mind by implicitly differentiating through the optimal adversary's training procedure. To this end, we derive implicit gradients of the optimal logistic regression parameters with respect to the input training embeddings, and use the fully-trained logistic regression as an adversary. As a result, we are able to train a model without alternating min max optimization, leading to better training stability and improved performance. Given the flexibility of our module for differentiable programming, we evaluate the impact of using implicit gradients in two adversarial fairness-centric formulations. We present quantitative results on the trade-offs of target and fairness tasks in several real-world domains.

0 Replies

Loading