Abstract: Information bottleneck (IB) is a promising defense solution against adversarial attacks on deep neural networks. However, these methods often suffer from spurious correlations. A correlation exists between the prediction and the non-robust features, yet it does not reflect the causal relationship well. Such spurious correlations induce the neural networks to learn fragile and incomprehensible (non-robust) features. This issue limits its potential for further improving adversarial robustness. This paper addresses this issue by incorporating causal inference into the IB-based defense framework. Specifically, we propose a novel defense method that use the instrumental variables to enhance the adversarial robustness. Our proposed method divides the features into two parts for causal effect estimation: robust and non-robust features. The robust features relate to understanding semantic information, and the non-robust features link to the vulnerable style information. By employing this framework, the IB method can mitigate the influence of non-robust features and extract the robust features linking to the semantic information of objects. We conduct a thorough analysis of the effectiveness of our proposed method. Notably, the experiments on MNIST, FashionMNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that our method significantly boosts the adversarial robustness against multiple adversarial attacks compared to previous methods. Our regularization method can improve adversarial robustness in both natural and adversarial training frameworks. Besides, CausalIB can be applied to both Convolutional Neural Networks and Vision Transformers as a plug-and-play module. Our code is available at https://github.com/HydrogenWasser/CausalIB
External IDs:dblp:journals/tifs/YanHHFGYW25
Loading