How adversarial attacks can disrupt seemingly stable accurate classifiers

Published: 01 Jan 2024, Last Modified: 12 May 2025Neural Networks 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A new theory for studying accuracy, adversarial attacks, and robustness is presented.•We present experiments confirming the theory on standard benchmarks.•The theory reveals when adversarial attacks affect seemingly stable classifiers.•Adding noise during training is inefficient for eradicating adversarial examples.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview