Defending against black-box adversarial attacks with gradient-free trained sign activation neural networks

Yunzhe Xue; Meiyan Xie; Zhibo Yang; Usman Roshan

Defending against black-box adversarial attacks with gradient-free trained sign activation neural networks

Yunzhe Xue, Meiyan Xie, Zhibo Yang, Usman Roshan

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: sign activation neural network, gradient-free training, stochastic coordinate descent, black box adversarial attack, hopskipjump, transferability, image distortion

Abstract: While machine learning models today can achieve high accuracies on classification tasks, they can be deceived by minor imperceptible distortions to the data. These are known as adversarial attacks and can be lethal in the black-box setting which does not require knowledge of the target model type or its parameters. Binary neural networks that have sign activation and are trained with gradient descent have been shown to be harder to attack than conventional sigmoid activation networks but their improvements are marginal. We instead train sign activation networks with a novel gradient-free stochastic coordinate descent algorithm and propose an ensemble of such networks as a defense model. We evaluate the robustness of our model (a hard problem in itself) on image, text, and medical ECG data and find it to be more robust than ensembles of binary, full precision, and convolutional neural networks, and than random forests while attaining comparable clean test accuracy. In order to explain our model's robustness we show that an adversary targeting a single network in our ensemble fails to attack (and thus non-transferable to) other networks in the ensemble. Thus a datapoint requires a large distortion to fool the majority of networks in our ensemble and is likely to be detected in advance. This property of non-transferability arises naturally from the non-convexity of sign activation networks and randomization in our gradient-free training algorithm without any adversarial defense effort.

One-sentence Summary: We show that an ensemble of our gradient free trained sign activation networks is much more adversarially robust than ensembles of binary, full precision, convolutional neural networks, and than random forest on image, text, and medical ECG data.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=2qogKBs4vU

8 Replies

Loading