Dynamic Knowledge Transfer for Mitigating Spurious Correlations in Deep Learning

Xiaoling Zhou, Zhemg Lee, Wei Ye, Shikun Zhang

Published: 2026, Last Modified: 18 Mar 2026Int. J. Comput. Vis. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The presence of spurious correlations is a prevalent issue in deep learning models, frequently leading to impaired generalization and reduced robustness. To address this issue, we introduce a knowledge transfer mechanism that operates across spuriously correlated categories within the deep feature space. Specifically, we enhance the deep representations of individual samples by incorporating semantic information derived not only from their respective class distributions but also from the distributions of classes with erroneous associations. This enrichment facilitates the generation of diverse class-specific factual and counterfactual augmented features, promoting more robust and discriminative representations. We then demonstrate the feasibility of optimizing a surrogate robust loss instead of conducting explicit augmentations by considering an infinite number of augmentations. As spurious associations between samples and classes evolve during training, we develop a REINFORCE-based training framework called Dynamic Knowledge Transfer (DKT) to facilitate dynamic adjustments in the direction and intensity of knowledge transfer. Within this framework, a target network is trained using the derived robust loss to enhance robustness, while a strategy network generates sample-wise augmentation strategies in a dynamic and automatic way. In addition to its applicability in supervised learning, DKT can be extended to semi-supervised learning by generating pseudo labels through a novel logit-calibrated prediction function under the setting of infinite augmentations, thus leveraging unlabeled data in a robust manner. Extensive experiments across diverse benchmarks demonstrate the effectiveness of DKT in mitigating spurious correlations, consistently achieving state-of-the-art performance in both supervised and semi-supervised scenarios.

External IDs:dblp:journals/ijcv/ZhouLYZ26