Using gradients to check sensitivity of MCMC-based analyses to removing data

Published: 27 Jun 2024, Last Modified: 20 Aug 2024Differentiable Almost EverythingEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Gradients, sensitivity, MCMC, data removal
TL;DR: We provide an approximation to what happens to conclusions made with MCMC in Bayesian models when a small percentage of data is removed.
Abstract: If the conclusion of a data analysis is sensitive to dropping very few data points, that conclusion might hinge on the particular data at hand rather than representing a more broadly applicable truth. To check for this sensitivity, one idea is to consider every small data subset, drop it, and re-run our analysis. But the number of re-runs needed is combinatorially large. Recent work proposes a differentiable relaxation to find the worst-case subset, but that work was developed for conclusions based on estimating equations --- and does not directly handle Bayesian posterior approximations using MCMC. We make two principal contributions. We adapt the existing data-dropping relaxation to estimators computed via MCMC; in particular, we re-use existing MCMC draws to estimate the necessary derivatives via a covariance relationship. Observing that Monte Carlo errors induce variability in the estimates, we use a variant of the bootstrap to quantify this uncertainty. Empirically, our method is accurate in simple models, such as linear regression. In models with complex structure, such as hierarchies, the performance of our method is mixed.
Submission Number: 8
Loading