Abstract: Machine learning systems can make more errors for certain populations and not others, and thus create discriminations. To assess such fairness issue, errors are typically compared across populations. We argue that we also need to account for the variability of errors in practice, as the errors measured in test data may not be exactly the same in real-life data (called target data). We first introduce statistical methods for estimating random error variance in machine learning problems. The methods estimate how often errors would exceed certain magnitudes, and how often the errors of a population would exceed that of another (e.g., by more than a certain range). The methods are based on well-established sampling theory, and the recently introduced Sample-to-Sample estimation. The latter shows that small target samples yield high error variance, even if the test sample is very large. We demonstrate that, in practice, minorities are bound to bear higher variance, thus amplified error and bias. This can occur even if the test and training sets are accurate, representative, and extremely large. We call this statistical phenomenon the curse on minorities, and we show examples of its impact with basic classification and regression problems. Finally, we outline potential approaches to protect minorities from such curse, and to develop variance-aware fairness assessments.
Loading