Evading Black-box Classifiers Without Breaking Eggs

Published: 07 Mar 2024, Last Modified: 07 Mar 2024SaTML 2024EveryoneRevisionsBibTeX
Keywords: security, threat models, black-box adversarial examples, decision-based attacks
TL;DR: We propose a new real-world oriented metric for black-box decision-based attacks on security-critical systems.
Abstract: Decision-based evasion attacks repeatedly query a black-box classifier to generate adversarial examples. Prior work measures the cost of such attacks by the total number of queries made to the classifier. We argue this metric is incomplete. Many security-critical machine learning systems aim to weed out "bad" data (e.g., malware, harmful content, etc). Queries to such systems carry a fundamentally *asymmetric cost*: "flagged" queries, i.e., detected as "bad", come at a higher cost because they trigger additional security filters, e.g., usage throttling or account suspension. Yet, we find that existing decision-based attacks issue a large number queries that would get flagged by a real-world system, which likely renders them ineffective against security-critical systems. We then design new attacks that reduce the number of flagged queries by $1.5$-$7.3\times$, but often at a significant increase in total (non-flagged) queries. We thus pose it as an open problem to build black-box attacks that are more effective under realistic cost metrics.
Submission Number: 124
Loading