UnCoVAEr: Estimating Causal Concept Effects under Visual Latent Confounding

18 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Concept-based explanations, Causal effect estimation, Latent confounders, Interpretability
TL;DR: UnCoVAEr provides a principled approach for estimating causal concept effects on images in the presence of latent confounders
Abstract: Estimating the effect of human-interpretable concepts on model predictions is crucial for explaining and auditing machine learning systems, as well as for mitigating their reliance on spurious correlations. Most existing approaches assume complete concept annotations, but in practice some concepts may remain unobserved and act as confounders, biasing causal effect estimates. We introduce **UnCoVAEr** (Unobserved Confounding Variational AutoEncoder), a latent-variable model that partitions image latent representations into confounder-related and non-confounding residual components. This allows us to (i) identify which observed concepts are confounded, (ii) obtain corrected unbiased effect estimates via backdoor adjustment, and (iii) learn confounder-proxy variables that align with underlying latent factors. On a controlled semi-synthetic MorphoMNIST benchmark, we show that UnCoVAEr yields substantially less biased effect estimates than prior methods, providing practitioners with a practical tool for trustworthy concept-level causal inference in partially annotated image datasets.
Primary Area: interpretability and explainable AI
Submission Number: 12496
Loading