Abstract: Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a real-valued target is not entirely clear. In this paper, we characterize the inherent tradeoff between statistical parity and accuracy in the regression setting by providing a lower bound on the error of any attribute-blind fair regressor. Our lower bound is sharp, algorithm-independent, and admits a simple interpretation: when the moments of the target differ between groups, any fair algorithm has to make an error on at least one of the groups. We further extend this result to give a lower bound on the joint error of any (approximately) fair algorithm, using the Wasserstein distance to measure the quality of the approximation. With our novel lower bound, we also show that the price paid by a fair regressor that does not take the protected attribute as input is less than that of a fair regressor with explicit access to the protected attribute. On the upside, we establish the first connection between individual fairness, accuracy parity, and the Wasserstein distance by showing that if a regressor is individually fair, it also approximately verifies the accuracy parity, where the gap is again given by the Wasserstein distance between the two groups. Inspired by our theoretical results, we develop a practical algorithm for fair regression through the lens of representation learning, and conduct experiments on a real-world dataset to corroborate our findings.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Fqzn1iPoW3
Changes Since Last Submission: In what follows we provide a summary of the changes since last TMLR submission: - We provide more detailed discussion on the objective function used in this paper, and contrast it to the usual $\ell_p$ loss over the joint distribution $\mu$ (Section 2, Fair Regression). We have also revised the whole paper to make sure that the same and consistent notation are used throughout the paper. - We add an example to illustrate the tightness of the lower bound for all $p \geq 1$ (Section 3.1, Page 5). - We provide an extended discussion to compare and contrast our results and the existing lower bounds in the literature (Section 3.1, Page 7-8). We point out the connection in the special case of $p = 2$ under the noiseless setting, and also comment out how it generalizes to the general case $p \geq 1$. - We add more transitions to motivate the proposed method (Section 3.4) and also discuss how it differs from existing methods based on learning fair representations. - We have addressed all the minor comments from the last submission regarding phrasing, notation as well as discussion of related works. ### 02/2023: - We have performed the corresponding experiments and updated the paper accordingly. The updated results and their discussion could be found in Section 4 and Table 2.
Assigned Action Editor: ~Bo_Dai1
Submission Number: 489