Is Merging Worth It? Securely Evaluating the Information Gain for Causal Dataset Acquisition

Published: 22 Jan 2025, Last Modified: 06 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We develop methodology to evaluate the expected information gain for merging datasets in causal inference problems. We also provide a secure cryptographic procedure to be used alongside this.
Abstract: Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge depends not only on reduction in epistemic uncertainty but also on improvement in overlap. To address this challenge, we introduce the first \emph{cryptographically secure} information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the \emph{Expected Information Gain} (EIG) using multi-party computation to ensure that no raw data is revealed. We further demonstrate that our approach can be combined with differential privacy (DP) to meet arbitrary privacy requirements whilst preserving more accurate computation compared to DP alone. To the best of our knowledge, this work presents the first privacy-preserving method for dataset acquisition tailored to causal estimation.Code is publicly available: \url{https://github.com/LucileTerminassian/causal_prospective_merge}.
Submission Number: 505
Loading