Abstract: Existing transferable attack methods commonly assume that the attacker knows the training set (e.g., the label set, the input size) of the opaque-box victim models, which is usually unrealistic because in some cases the attacker cannot know this information. In this paper, we define a Generalized Transferable Attack (GTA) problem where the attacker operates without prior knowledge of these specifics and must attack randomly encountered images, potentially from unknown datasets. To solve the challenging GTA problem, we propose a novel Image Classification Disruptor (ICD), designed to train a particular attack to disrupt classification information of any images from arbitrary datasets. Experiments across several datasets demonstrate that ICD clearly outperforms existing transferable attacks on GTA, and show that ICD uses similar texture-like noises to perturb different images from different datasets. Moreover, we observed that ICD noise across images mainly consists of three specific-frequency sine waves for the R, G, and B channels. Inspired by this interesting finding, we also design another novel Sine Attack (SA) method directly optimizes the three sine waves. Experiments show that SA performs comparably to ICD, revealing a notable vulnerability in CNNs under the GTA setting.
External IDs:doi:10.1109/tcsvt.2025.3597841
Loading