Adversarial Surrogate Risk Bounds for Binary Classification

Published: 23 Oct 2025, Last Modified: 23 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: A central concern in classification is the vulnerability of machine learning models to adversarial attacks. Adversarial training is one of the most popular techniques for training robust classifiers, which involves minimizing an adversarial surrogate risk. Recent work has characterized the conditions under which any sequence minimizing the adversarial surrogate risk also minimizes the adversarial classification risk in the binary setting, a property known as \emph{adversarial consistency}. However, these results do not address the rate at which the adversarial classification risk approaches its optimal value along such a sequence. This paper provides surrogate risk bounds that quantify that convergence rate.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: - We streamlined the proof of Theorem 9 into lemmas. This modification led to a slightly improved constant in the bound - We showed a lower bound for Theorem 9 - We tightened and shorted the background section - In the introduction and background sections, we added some discussion on why surrogate bounds are harder in the adversarial scenario. - We added bounds that apply even when the adversarial Bayes classifier is not unique, under mild assumptions (Theorems 10 and 12). These theorems make it easier to understand our results in the context of real-world datasets. - We added a sub-section discussing our bounds in the context of real-world datasets - We improved the clarity of the writing in this paper - We fixed an error in the proof of Theorem 8 in Appendix D, and make this proof much shorter - We streamlined the proofs in Appendix G, which led to a slightly better bound in Theorem 11 -We fixed the grammar of our citations
Supplementary Material: pdf
Assigned Action Editor: ~Han_Bao2
Submission Number: 4973
Loading