An Instrumental Value for Data Production and its Application to Data Pricing

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We develop a framework for capturing the instrumental value of data production processes, which accounts for two key factors: (a) the context of the agent’s decision-making; (b) how much data or information the buyer already possesses. We "micro-found" our data valuation function by establishing its connection to classic notions of signals and information design in economics. When instantiated in Bayesian linear regression, our value naturally corresponds to information gain. Applying our proposed data value in Bayesian linear regression for monopoly pricing, we show that if the seller can fully customize data production, she can extract the first-best revenue (i.e., full surplus) from any population of buyers, i.e., achieving first-degree price discrimination. If data can only be constructed from an existing data pool, this limits the seller’s ability to customize, and achieving first-best revenue becomes generally impossible. However, we design a mechanism that achieves seller revenue at most $\log(\kappa)$ less than the first-best, where $\kappa$ is the condition number associated with the data matrix. As a corollary, the seller extracts the first-best revenue in the multi-armed bandits special case.
Lay Summary: How do we determine the value of data to an agent? It depends on the problem the agent is facing and the amount of information they already possess. From the perspective of rational agent decision-making, we propose an instrumental value framework that characterizes valid data valuation. Notably, we show that in the case of Bayesian linear regression, this value coincides with information gain. We then apply our instrumental value framework to a monopoly data pricing setting. We find that when the seller can perfectly customize data production, the buyer's surplus is zero, leading to severe market asymmetry and unfairness. In contrast, under limited customization, we derive an upper bound on the buyer's surplus. This prompts broader reflections on how to price such novel products in the data era and the resulting concerns about market fairness.
Primary Area: Theory->Game Theory
Keywords: Instrumental Value, Data Production Process, Data Pricing, Data Customization
Submission Number: 8273
Loading