Private Data Measurements for Decentralized Data Markets

ICLR 2024 Workshop DMLR Submission45 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024DMLR @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Data Markets, Data Valuation, Data Measurements
TL;DR: We identify challenges with centralized data brokers as well as existing data valuation methods and propose a data measurement framework based on measuring the relevance and diversity of a seller's data in relation to the buyer's data.
Abstract: As training data is fundamental to current machine learning, incentivizing data access will be crucial in data-limited application areas such as healthcare. Data markets have been proposed to incentivize greater data access. However, information asymmetry about data value between data owner and data consumer can impede otherwise beneficial transactions from taking place. In this paper, we study data measurements of relevance and diversity to resolve this information asymmetry. Unlike previous work in data valuation, our heuristic-based approach is cheap to compute, task-agnostic, and does not require centralized data access --- properties that are well-suited for a decentralized marketplace setting. We evaluate our approach on several medical imaging datasets and find that relevance measurements are effective at discriminating between data domains, while diversity measures are more useful in selecting sellers that have similar distributions. Code for our experiments is available at https://github.com/clu5/data-valuation.
Primary Subject Area: Data collection and benchmarking techniques
Paper Type: Research paper: up to 8 pages
DMLR For Good Track: Participate in DMLR for Good Track
Participation Mode: Virtual
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Submission Number: 45
Loading