Empirically Calibrated Conditional Independence Tests

Milleno Pan; Antoine de Mathelin; Wesley Tansey

Empirically Calibrated Conditional Independence Tests

Milleno Pan, Antoine de Mathelin, Wesley Tansey

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We optimize an adversary to find the worst-case FDR for a given conditional independence test and then calibrate against it.

Abstract: Conditional independence (CI) tests are widely used for causal discovery and feature selection. Even with False Discovery Rate (FDR) control procedures, they often fail to provide frequentist guarantees in practice. We identify two pitfalls: (i) when sample sizes are small, even correctly specified models fail to estimate the noise level and control the error, and (ii) when sample sizes are large but models are misspecified, unaccounted dependencies skew the test’s behavior and fail to return uniform p-values under the null. We propose Empirically Calibrated Conditional Independence Tests (ECCIT), a method that measures and corrects for miscalibration. Given a dataset X, we train an adversary that selects features and creates responses Y to maximize a miscalibration objective for a chosen base CI test (e.g., GCM, HRT). We then fit a monotone calibration map that adjusts the base-test p-values in proportion to the observed miscalibration. We introduce two miscalibration metrics and evaluate their empirical performance. In well-specified settings, we show that finite-sample errors in noise estimation explain the gap between nominal and realized size. In misspecified examples, ECCIT achieves valid FDR with higher valid power, outperforming existing calibration strategies while remaining test-agnostic.

Submission Number: 2367

Loading