The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning

Published: 08 Feb 2024, Last Modified: 08 Feb 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Modern data aggregation often involves a platform collecting data from a network of users with various privacy options. Platforms must solve the problem of how to allocate incentives to users to convince them to share their data. This paper puts forth an idea for a fair amount to compensate users for their data at a given privacy level based on an axiomatic definition of fairness, along the lines of the celebrated Shapley value. To the best of our knowledge, these are the first fairness concepts for data that explicitly consider privacy constraints. We also formulate a heterogeneous federated learning problem for the platform with privacy level options for users. By studying this problem, we investigate the amount of compensation users receive under fair allocations with different privacy levels, amounts of data, and degrees of heterogeneity. We also discuss what happens when the platform is forced to design fair incentives. Under certain conditions we find that when privacy sensitivity is low, the platform will set incentives to ensure that it collects all the data with the lowest privacy options. When the privacy sensitivity is above a given threshold, the platform will provide no incentives to users. Between these two extremes, the platform will set the incentives so some fraction of the users chooses the higher privacy option and the others chooses the lower privacy option.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: This is the third revision of this paper. Below are the list of notable changes in this revision: **Abstract** 1. Minor changes to make the abstract more concise. **Introduction** 1. Slight changing of wording to be more precise in the opening 3 level privacy example. 2. Modifying main contributions section. Removed opening exposition about prior works now in bullet 4. Some content about Section 4.2 added to bullet 3. 3. Reference to new Appendix section with notation table. **Section 2** 1. Less vague about $\rho=1$ example. 2. Explain difference between privacy sensitivity and privacy level. 3. Add mechanism design over pure strategy NEs and further discuss which one is suitable (because we now use pure NE in Section 4.2). **Section 3** 1. Rewording introduction to be more concise. 2. Adding plain language explanations to axioms. 3. Change $a \rightarrow z$ for the platform action in Theorem 1. 4. Direct references to context of existing literature following axiom. 5. Move all examples to one section at the end. 6. Expand Computational Complexity paragraph, and move to a less awkward location in text. **Section 4** 1. New table to explain $\rho_i=0,1,2$, and relevant references in text. 2. Explaining log utility function. 3. Bold text outlining what $a_i$ means. 4. New discussion of equilibrium and Fig. 6, explaining why users choose $\rho$. 5. Fig. 7 showing solution to the problem from platform, and surrounding discussion about how to solve it, and what makes these problems generally hard to solve. Reference to new Appendix section with more information about solution. 6. Citing other Fed. learning related papers that could fit into fairness framework. **Section 5** 1. New introduction to contextualize better with new Section 4.2. 2. New Fig. 8 to include information about $\alpha^*$. Greatly expanded discussion in Section 5.1. **Conclusion** 1. Comments relating to new Section 4.2 2. Comments mentioning the importance of dealing with changing user base in future directions. **Appendix** 1. New notation table. 2. New section explaining how we did calculations in Section 4.2 efficiently. 3. New section with analytic results for Section 5.1.
Assigned Action Editor: ~Ahmad_Beirami1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1415