Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift

Incorporating Interventional Independence Improves Robustness against Interventional Distribution Shift

TMLR Paper2851 Authors

11 Jun 2024 (modified: 22 Oct 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider learning discriminative representations of variables related to each other via a causal graph. To learn representations that are robust against interventional distribution shifts, the training dataset is augmented with interventional data in addition to existing observational data. However, even when the underlying causal model is known, existing approaches treat interventional data like observational data, ignoring the independence relations resulting from these interventions. This leads to representations that exhibit large disparities in predictive performance on observational and interventional data. The performance disparity worsens when the quantity of interventional data available for training is limited. In this paper, (1) we first identify a strong correlation between this performance disparity and adherence of the representations to the statistical independence conditions induced by the underlying causal model during interventions. (2) For linear models, we derive sufficient conditions on the proportion of interventional data during training, for which enforcing statistical independence between representations corresponding to the intervened node and its non-descendants during interventions can lower the test-time error on interventional data. Following these insights, we propose RepLIn, an algorithm to explicitly enforce this statistical independence during interventions. We demonstrate the utility of RepLIn on synthetic and real face image datasets. Our experiments show that RepLIn is scalable with the number of nodes in the causal graph and is suitable to improve the robustness of representations against interventional distribution shifts of both continuous and discrete latent variables compared to the ERM baselines.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: - New results on CivilComments dataset and new baselines for Windmill experiment - Clarification on some terminology and assumptions. - A paragraph on limitations. - Minor writing changes

Assigned Action Editor: ~Yu_Yao3

Submission Number: 2851

Loading