Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We design algorithms to incentivize data sharing in the presence of strategic agents.
Abstract: We study a collaborative learning problem where $m$ agents aim to estimate a vector $\mu =(\mu_1,\ldots,\mu_d)\in \mathbb{R}^d$ by sampling from associated univariate normal distributions $(\mathcal{N}(\mu_k, \sigma^2))\_{k\in[d]}$. Agent $i$ incurs a cost $c_{i,k}$ to sample from $\mathcal{N}(\mu_k, \sigma^2)$. Instead of working independently, agents can exchange data, collecting cheaper samples and sharing them in return for costly data, thereby reducing both costs and estimation error. We design a mechanism to facilitate such collaboration, while addressing two key challenges: ensuring *individually rational (IR) and fair outcomes* so all agents benefit, and *preventing strategic behavior* (e.g. non-collection, data fabrication) to avoid socially undesirable outcomes. We design a mechanism and an associated Nash equilibrium (NE) which minimizes the social penalty-sum of agents' estimation errors and collection costs-while being IR for all agents. We achieve a $\mathcal{O}(\sqrt{m})$-approximation to the minimum social penalty in the worst case and an $\mathcal{O}(1)$-approximation under favorable conditions. Additionally, we establish three hardness results: no nontrivial mechanism guarantees *(i)* a dominant strategy equilibrium where agents report truthfully, *(ii)* is IR for every strategy profile of other agents, *(iii)* or avoids a worst-case $\Omega(\sqrt{m})$ price of stability in any NE. Finally, by integrating concepts from axiomatic bargaining, we demonstrate that our mechanism supports fairer outcomes than one which minimizes social penalty.
Lay Summary: Machine learning has increased the value of data, which is often costly to collect but easy to share. While most existing data sharing platforms assume honesty, participants may lie about their contributions to gain access to others’ data. When participants have different data collection costs there are two key challenges in designing truthful data sharing algorithms. The first is creating a method to validate the submission of each contributor using the other participants’ data. The second, is ensuring there is enough data from all participants so that each agent’s submission can be sufficiently validated against the others, without compromising on efficiency. We address these problem in two parts. First, we determine how to fairly divide the work of data collection. Second, we reward participants based on the quality of the data they submitted. For each participant we compare the mean of their data to the mean of the others’ data. Instead of returning the others’ data to them, we first corrupt it based on the difference of the means. If a participant wants to receive the others’ data with minimal corruption, it is in their best interest to collect a sufficient amount of data and share it truthfully.
Primary Area: Theory->Game Theory
Keywords: Mechanism design, collaborative learning, mean estimation
Submission Number: 5162
Loading