Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference

Yuhong Luo; Austin Hoag; Xintong Wang; Philip S. Thomas; Przemyslaw A. Grabowicz

Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference

Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Fair Representation Learning, Statistical Guarantees, Controllable Guarantees

TL;DR: We introduce a representation learning framework that provides high-confidence fairness guarantees with controllable error thresholds and confidence levels via adversarial inference.

Abstract: Representation learning is increasingly applied to generate representations that generalize well across multiple downstream tasks. Ensuring fairness guarantees in representation learning is crucial to prevent unfairness toward specific demographic groups in downstream tasks. In this work, we formally introduce the task of learning representations that achieve high-confidence fairness. We aim to guarantee that demographic disparity in every downstream prediction remains bounded by a *user-defined* error threshold $\epsilon$, with *controllable* high probability. To this end, we propose the ***F**air **R**epresentation learning with high-confidence **G**uarantees (FRG)* framework, which provides these high-confidence fairness guarantees by leveraging an optimized adversarial model. We empirically evaluate FRG on three real-world datasets, comparing its performance to six state-of-the-art fair representation learning methods. Our results demonstrate that FRG consistently bounds unfairness across a range of downstream models and tasks.

Supplementary Material: zip

Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)

Submission Number: 13122

Loading