Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift

Published: 13 Dec 2025, Last Modified: 13 Dec 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We consider the problem of learning robust discriminative representations of causally related latent variables given the underlying directed causal graph and a training set comprising passively collected observational data and interventional data obtained through targeted interventions on some of these latent variables. We desire to learn representations that are robust against the resulting interventional distribution shifts. Existing approaches treat interventional data like observational data and ignore the independence relations that arise from these interventions, even when the underlying causal model is known. As a result, their representations lead to large disparities in predictive performance between observational and interventional data. This performance disparity worsens when interventional training samples are scarce. In this paper, (1) we first identify a strong correlation between this performance disparity and the representations' violation of statistical independence induced during interventions. (2) For linear models, we derive sufficient conditions on the proportion of interventional training data, for which enforcing statistical independence between representations of the intervened node and its non-descendants during interventions lowers the test-time error on interventional data. Combining these insights, (3) we propose RepLIn, a training algorithm that explicitly enforces this statistical independence between representations during interventions. We demonstrate the utility of RepLIn on a synthetic dataset, and on real image and text datasets on facial attribute classification and toxicity detection, respectively, with semi-synthetic causal structures. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve robustness against interventional distribution shifts of both continuous and discrete latent variables compared to the ERM baselines.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=pZRanZlab4
Changes Since Last Submission: - Added experimental results on the effect of loss hyperparameters $\lambda_{\text{dep}}$ and $\lambda_{\text{self}}$. - Added new baselines to the primary experiments. Added results for toxicity detection on CivilComments dataset. - Added subsection on limitations of the proposed method. - Minor writing changes to further clarify our objectives and claims.
Assigned Action Editor: ~Yu_Yao3
Submission Number: 5315
Loading