Bias-inducing geometries: exactly solvable data model with fairness implications

Published: 17 Jun 2024, Last Modified: 02 Jul 2024ICML 2024 Workshop GRaMEveryoneRevisionsBibTeXCC BY 4.0
Track: Extended abstract
Keywords: Data structure, fairness, statistical physics
TL;DR: By designing a solvable generative model, we quantify the impact of data structure in generating bias.
Abstract: Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In this abstract, we aim to clarify the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Simplifying the nature of the problem to its minimal components, we can retrace and unpack typical unfairness behaviour observed on real-world datasets
Submission Number: 3
Loading