Causal Fine-Tuning under Latent Confounded Shift

Jialin Yu; Yuxiang Zhou; Haoxuan Li; Junchi Yu; Mengyue Yang; Yulan He; Nevin L. Zhang; Philip Torr; Ricardo Silva

Causal Fine-Tuning under Latent Confounded Shift

Jialin Yu, Yuxiang Zhou, Haoxuan Li, Junchi Yu, Mengyue Yang, Yulan He, Nevin L. Zhang, Philip Torr, Ricardo Silva

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Adapting to latent confounded shift remains a core challenge in modern AI. This setting is driven by hidden variables that induce spurious correlations between inputs and outputs during training, leading models to rely on non-causal shortcuts. For example, a model may learn to treat metadata (e.g., data source like "Amazon") as a proxy for positive sentiment, causing failure when the source becomes predominantly negative during deployment. To address this *latent confounded shift*, we introduce Causal Fine-Tuning (CFT). Using a structural causal model as an inductive bias, we derive sufficient identification conditions that motivate a fine-tuning objective for decomposing representations into high-level stable and low-level shift-sensitive components. Instantiating this framework in BERT, we show that learning such causal/spurious representations and adjusting them accordingly yield a more robust predictor. Experiments on spurious correlation injection attacks in text demonstrate that our method outperforms black-box domain generalization baselines, highlighting the benefits of explicitly modeling causal structure.

Lay Summary: AI systems can make unreliable decisions when they learn misleading shortcuts from training data. For example, a model that predicts review sentiment might notice that reviews from one website are often positive and use the website name as a shortcut, instead of understanding the actual meaning of the review. This can cause the model to fail when the same website later contains mostly negative reviews. Our work aims to make foundation models more robust to this kind of problem. We introduce a fine-tuning method that encourages the model to separate information that is truly useful for the task from information that only appears useful because of hidden patterns in the training data. By reducing the model’s reliance on these unstable shortcuts, our method improves performance when the test data differs from the training data.

Originally Submitted Supplementary Material: zip

Link To Code: https://github.com/jialin-yu/CausalFineTuning

Primary Area: General Machine Learning->Causality

Keywords: causality, latent confounded shift, causal generalisation

Originally Submitted PDF: pdf

Submission Number: 6680

Loading