Abstract: Constructing adversarial examples usually requires labels, which provide a loss gradient to construct the example. We show that for batch normalized architectures, intermediate latents that are produced after a batch normalization step suffice to produce adversarial examples using an intermediate loss solely utilizing angular deviations, without any label. We motivate our loss through the geometry of batch normed representations and concentration on a known hypersphere. Our losses build on and expand intermediate latent based attacks that usually require labels. The success of our method implies that leakage of intermediate representations may suffice to create a security breach for deployed models, which persist even when the model is transferred to downstream usage. We further show that removal of batch norm weakens our attack significantly, suggesting that batch norm's contribution to adversarial vulnerability may be understood by analyzing such attacks.
1 Reply
Loading