Keywords: adversarial attacks, traffic sign recognition, shadow-based perturbations, vision transformers, attention alignment, black-box attacks physical-world attacks, temporal consistency, differentiable rendering
TL;DR: SASA is a black-box adversarial attack that uses differentiable, physically realistic shadow patterns guided by frozen transformer attention maps to fool traffic sign recognition systems.
Abstract: We propose \textbf{SASA} (Sequence-Aware Shadow Attack), a black-box adversarial framework that uses physically realistic, differentiable shadow patterns to deceive traffic sign recognition systems. Unlike prior image-based attacks, SASA targets video sequences—common in real-world driving—by generating smooth, temporally consistent shadows that remain visually plausible and imperceptible to humans. Guided by attention maps from frozen vision transformers, SASA aligns shadow placement with semantically salient regions without querying the target model. Evaluated on the GTSRB dataset, SASA reduces classification accuracy by up to 86\% and sequence-level accuracy by over 90\% on black-box models, including CNNs and ViTs. The method generalizes across architectures, preserves perceptual quality, and reveals a novel vulnerability in sequential vision systems.
Submission Number: 21
Loading