Keywords: Causal Inference, Partial Identification, Invariance, Data Augmentation
TL;DR: We repurpose symmetry-based data augmentation as an interventional tool to provably sharpen the bounds on causal effects derived from partial identification.
Abstract: We present a novel framework for using knowledge of data symmetries to sharpen bounds in causal *partial identification (PI)*. The causal effect of the treatment $X$ on outcome $Y$ is generally not identifiable from observational data alone if their common causes, also known as confounders, are unobserved. PI entails estimating bounds on such treatment effects by solving a constrained optimization problem that encodes different assumptions imposed on data generation. PI has use in many application domains where such bounds are sufficient to inform policy decisions, even if the treatment effect itself is not identifiable. We show that knowledge of symmetries in data generation—formalized as invariance under transformation groups—provides additional constraints that tighten these bounds. We operationalize this insight through two approaches: (1) adding explicit invariance error constraints to existing PI methods, and (2) applying symmetry-preserving *data augmentation (DA)* as a pre-processing step. Under a linear Gaussian model, we show that the later yields bounds that provably valid (containing the true causal effect), sharper (smaller identified sets), and more robust (lower worst-case error). The key mechanism being that randomized symmetry transformations introduce exogenous variation in $X$ that cannot be attributed to confounding, thereby reducing ambiguity in the identified set. Experiments on synthetic and real data validate our approach. More broadly, our findings establish known data symmetries—ubiquitously employed in DA for variance reduction—can be repurposed as a principled tool for causal inference when point-identification is impossible.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 22450
Loading