Feature and Temporal Disruption Attacks from Images to Videos

Published: 2025, Last Modified: 07 Jan 2026ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The improvement of transferability of adversarial examples is the key property in practical black-box scenarios. Recent research has identified that transferable adversarial examples for video models can be effectively crafted with image models. However, existing studies primarily target single-layer features, overlooking the influence of diverse feature layers. Moreover, they neglect transitions between video frames and fail to fully capture temporal context. In this paper, we introduce an efficient and stable cross-modal attack method termed Feature and Temporal Disruption Attack (FTDA). Our approach caters to both feature space diversity and temporal cues by introducing two innovative modules, i.e., Depth-Aware Feature Fusion Attack (DF2A) and Clip-Based Temporal Fusion Attack (CTFA). Extensive experiments demonstrate that our approach achieves SOTA. Our code is available at https://github.com/xiaopengge2000/FTDA.
Loading