From Benchmarking to Understanding FairML

Mykola Pechenizkiy, Hilde Weerts, Cassio de Campos, Yuya Sasaki, Julia Stoyanovich

Published: 21 Oct 2025, Last Modified: 15 Dec 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Benchmarks play a central role in machine learning (ML), offering standardized datasets and metrics that enable comparison and drive progress. In fairness-aware ML (fairML), however, benchmarks pose distinctive challenges. Fairness is not a purely technical property but a socio-technical concept, shaped by normative choices and institutional context. Benchmarks strip away this context: they reduce fairness to intrinsic metrics, obscure what is comparable, and collapse distinct notions of justice—from distributive allocation in credit scoring to basic rights in criminal justice—into a single optimization task. Moreover, when used as measures of progress, benchmarks risk enshrining oversimplified metrics as community standards, assuming an exception to Goodhart’s law. We argue that while benchmarking has value for building baselines and organizing competition, responsible evaluation of fairML requires complementary frameworks: ones that combine intrinsic with extrinsic, context-sensitive assessments, and that make explicit the normative assumptions underlying fairness interventions.
Loading