Abstract: Modern neural networks are able to perform at least as well as humans in numerous tasks involving object classification and image generation. However, there is also evidence that perturbations which are imperceptible to humans may significantly degrade the performance of well-trained deep neural networks. We provide a Distributionally Robust Optimization (DRO) framework which integrates human-based image quality assessment methods to design optimal attacks that are imperceptible to humans but significantly damaging to deep neural networks. Our attack algorithm can generate better-quality (less perceptible to humans) attacks than other state-of-the-art human imperceptible attack methods. We provide an algorithmic implementation of independent interest which can speed up DRO training significantly. Finally, we demonstrate how the use of optimally designed human imperceptible attacks can improve group fairness in image classification while maintaining a similar accuracy.
Supplementary Material: zip
1 Reply
Loading