Abstract: Recent studies have shown that ad-hoc image transformations are effective for defending certain adversarial attacks. It is desirable to determine which transformations are more effective than others for adversarial defenses before these transformations are being deployed in practice. We propose in this paper the notion of Critical Transformation Robustness (CTR), which can indicate potentially the detection performance of an input transformation through the difference between its CTR on clean inputs and that on adversarial ones. Then, based on this new notion, we further present a general training framework that can deliver an effective and efficient adversarial detector, which features in a customized combination of specific input transformations for defending the given, possibly mixed, adversarial attacks. We evaluate our training framework on 3 typical image datasets with 17 types of popular input transformations for detecting a mixture of 22 types of adversarial attacks. Experimental results show that the CTR differences between the clean and adversarial inputs can essentially guide the selection of an effective combination of input transformations with their nearly-optimal parameter values. Furthermore, compared with the state-of-the-art input transformation-based adversarial detection methods, the detectors generated by our training framework exhibit on average 73.4% - 87.5% higher performance on the mixed adversarial attacks.
0 Replies
Loading