Meta-Analysis with Untrusted Data

Shiva Kaul, Geoffrey J. Gordon

Published: 01 Jan 2024, Last Modified: 23 Aug 2025ML4H@NeurIPS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Meta-analyses are usually conducted on small amounts of “trusted” data, ideally from randomized, controlled trials. Excluding untrusted (observational) data — such as medical records and related scientific literature — avoids potential confounding and ensures unbiased conclusions. Unfortunately, this exclusion can reduce predictive accuracy to the point of clinical irrelevance, especially when trials are heterogeneous. This paper shows how untrusted data can be safely incorporated into meta-analysis, improving predictions without sacrificing rigor or introducing unproven assumptions. Our approach, called conformal meta-analysis, consists of (1) learning a (potentially flawed) prior distribution from the untrusted data, (2) using the prior and trusted data to derive a simple, fully-conformal prediction interval for the observed trial effect, and (3) analytically extracting an interval for the true (unobserved) effect. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper conceptually realigns meta-analysis as a foundation for evidence-based medicine, embracing heterogeneity and untrusted data for more nuanced, precise predictions.

External IDs:dblp:conf/ml4h/KaulG24