Bivariate Causal Discovery with Proxy Variables: Integral Solving and Beyond

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a proxy-based method for integral solving to identify causal relations when unobserved variables are present.
Abstract: Bivariate causal discovery is challenging when unmeasured confounders exist. To adjust for the bias, previous methods employed the proxy variable (*i.e.*, negative control outcome (NCO)) to test the treatment-outcome relationship through integral equations -- and assumed that violation of this equation indicates the causal relationship. Upon this, they could establish asymptotic properties for causal hypothesis testing. However, these methods either relied on parametric assumptions or required discretizing continuous variables, which may lead to information loss. Moreover, it is unclear when this underlying integral-related assumption holds, making it difficult to justify the utility in practice. To address these problems, we first consider the scenario where only NCO is available. We propose a novel non-parametric procedure, which enjoys asymptotic properties and preserves more information. Moreover, we find that when NCO affects the outcome, the above integral-related assumption may not hold, rendering the causal relation unidentifiable. Informed by this, we further consider the scenario when the negative control exposure (NCE) is also available. In this scenario, we construct another integral restriction aided by this proxy, which can discover causation when NCO affects the outcome. We demonstrate these findings and the effectiveness of our proposals through comprehensive numerical studies.
Lay Summary: Discovering cause-and-effect relationships from data is difficult when hidden confounders exist. A popular approach in causal inference is to use negative control outcomes—proxy variables that help detect bias. Existing methods based on this idea often rely on strong assumptions or need large datasets to work well. We propose a new method that avoids these limitations: it makes fewer assumptions and works better with smaller samples. We also show that when the proxy itself affects the outcome, standard assumptions may fail, making causality impossible to identify. To address this, we introduce a new strategy that uses an additional proxy, a negative control exposure, to uncover causal effects even in these harder cases. Our method is supported by both theory and experiments.
Primary Area: General Machine Learning->Causality
Keywords: causal discovery, unmeasured confounding, proxy variables, integral equation, causal hypothesis testing, negative control
Submission Number: 4841
Loading