Enhancing Adversarial Transferability with Checkpoints of a Single Model's Training

Published: 01 Jan 2025, Last Modified: 25 Sept 2025CVPR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Adversarial attacks threaten the integrity of deep neural networks (DNNs), particularly in high-stakes applications. In this paper, we present a novel black-box adversarial attack that leverages the diverse checkpoints generated during a single model's training trajectory. Unlike conventional ensemble attacks that require multiple surrogate models with diverse architectures, our approach exploits the intrinsic diversity captured over different training stages of a single surrogate model. By decomposing the learned representations into task-intrinsic and task-irrelevant components, we employ an accuracy gap-based selection strategy to identify checkpoints that predominantly capture transferable, task-intrinsic knowledge. Extensive experiments on ImageNet and CIFAR-10 demonstrate that our method consistently outperforms traditional ensemble attacks in terms of transferability, even under resource-constrained and practical settings. This work offers a resource-efficient solution for crafting highly transferable adversarial examples and provides new insights into the dynamics of adversarial vulnerability.
Loading