How adversarial attacks can disrupt seemingly stable accurate classifiers

Published: 01 Jan 2024, Last Modified: 12 May 2025Neural Networks 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A new theory for studying accuracy, adversarial attacks, and robustness is presented.•We present experiments confirming the theory on standard benchmarks.•The theory reveals when adversarial attacks affect seemingly stable classifiers.•Adding noise during training is inefficient for eradicating adversarial examples.
Loading