Abstract: The problem of learning to distinguish good inputs from malicious has come to be known as adversarial classification emphasizing the fact that, unlike traditional classification, the adversary can manipulate input instances to avoid being so classified. We offer the first general theoretical analysis of the problem of adversarial classification, resolving several important open questions in the process. First, we significantly generalize previous results on adversarial classifier reverse engineering (ACRE), showing that if a classifier can be efficiently learned, it can subsequently be efficiently reverse engineered with arbitrary precision. We extend this result to randomized classification schemes, but now observe that reverse engineering is imperfect, and its efficacy depends on the defender's randomization scheme. Armed with this insight, we proceed to characterize optimal randomization schemes in the face of adversarial reverse engineering and classifier manipulation. What we find is quite surprising: in all the model variations we consider, the defender's optimal policy tends to be either to randomize uniformly (ignoring baseline classification accuracy), which is the case for targeted attacks, or not to randomize at all, which is typically optimal when attacks are indiscriminate.
0 Replies
Loading