STRATA: Simple, Gradient-free Attacks for Models of Code

Jacob M. Springer; Bryn Marie Reinstadler; Una-May O'Reilly

STRATA: Simple, Gradient-free Attacks for Models of Code

Jacob M. Springer, Bryn Marie Reinstadler, Una-May O'Reilly

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Deep Learning, Models of Code, Black-box Adversarial Attacks, Adversarial Robustness

Abstract: Adversarial examples are imperceptible perturbations in the input to a neural model that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must adhere to strict semantic guidelines so the resulting programs retain the functional meaning of the code. We propose a simple and efficient gradient-free method for generating state-of-the-art adversarial examples on models of code that can be applied in a white-box or black-box setting. Our method generates untargeted and targeted attacks, and empirically outperforms competing gradient-based methods with less information and less computational effort.

One-sentence Summary: We present an efficient state-of-the-art method for constructing gradient-free adversarial attacks for models of code that outperform currently available gradient-based attacks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=yZJC0_HIHB

12 Replies

Loading