Robust purification defense for transfer attacks based on probabilistic scheduling algorithm of pre-trained models: A model difference perspective

Xinlei Liu, Jichao Xie, Tao Hu, Baolin Li, Peng Yi

Published: 19 Dec 2024, Last Modified: 05 Mar 2025TrustCom2024EveryoneCC BY 4.0

Abstract: Neural networks are vulnerable to meticulously crafted adversarial examples, resulting in high-confidence misclassifications in image classification tasks. Due to their stealthiness and difficulty in detection, black-box transfer attacks have become a significant focus of defense. In this article, we propose a purification defense based on probabilistic scheduling algorithm of pre-trained models (ProbSched-PTM) to counter diverse transfer attacks. We first quantify the differences among various models based on their output scores and verify the linear negative correlation between adversarial transferability and model difference. Subsequently, guided by the model difference probability, we integrate the negative momentum probability as a regularization factor to construct ProbSched-PTM. It selects the most appropriate substitute model from multiple pre-trained models to generate strong-transferability adversarial examples for training the purification model, which enables the purification model to effectively eliminate diverse adversarial perturbations. The ProbSched-PTM-based purification defense provides robust defense against unseen adversarial attacks from different substitute models. In a black-box attack scenario, utilizing ResNet-34 as the target model, our approach achieves average defense rates of over 94.8% on CIFAR-10 and over 71.2% on Mini-ImageNet, demonstrating state of-the-art performance.