Track: Economics, online markets and human computation
Keywords: Economics of fairness, balanced training data, endogenous data production
TL;DR: Online data markets may or may not be able to sustainably and efficiently address ethical issues in machine learning depending on market conditions.
Abstract: Many ethical issues in machine learning are connected to the training data. Online data markets are an important source of training data, facilitating both production and distribution. Recently, a trend has emerged of for-profit “ethical” participants in online data markets. This trend raises a fascinating question: Can online data markets sustainably and efficiently address ethical issues in the broader machine-learning economy?
In this work, we study this question in a stylized model of an online data market. We investigate the effects of intervening in the data market to achieve balanced training-data production. The model reveals the crucial role of market conditions. Under some conditions, an intervention can drive the data producers out of the market, so that the cost of fairness is maximal. Yet, under other conditions, the cost of fairness can vanish (as a fraction of overall welfare) as the market grows.
Our results suggest that “ethical” online data markets can be economically feasible under favorable market conditions, and motivate
more work to consider the role of data production and distribution in mediating the impacts of ethical interventions.
Submission Number: 556
Loading