TOWARDS FEATURE SPACE ADVERSARIAL ATTACKDownload PDF

25 Sept 2019 (modified: 22 Oct 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Abstract: We propose a new type of adversarial attack to Deep Neural Networks (DNNs) for image classification. Different from most existing attacks that directly perturb input pixels. Our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by injecting style changes insensitive for humans, through an optimization procedure. We show that state-of-the-art pixel space adversarial attack detection and defense techniques are ineffective in guarding against feature space attacks.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2004.12385/code)
Original Pdf: pdf
7 Replies

Loading