Abstract: In recent years, much work has been devoted to designing certified
defences for neural networks, i.e., methods for learning neural
networks that are provably robust to certain adversarial
perturbations. Due to the non-convexity of the problem, dominant
approaches in this area rely on convex approximations, which are
inherently loose. In this paper, we question the effectiveness of such
approaches for realistic computer vision tasks. First, we provide
extensive empirical evidence to show that certified defences suffer
not only worse accuracy but also worse robustness and fairness than
empirical defences. We hypothesise that the reason for why certified
defences suffer in generalisation is (i) the large number of
relaxed non-convex constraints and (ii) strong alignment between the
adversarial perturbations and the "signal" direction. We provide a
combination of theoretical and experimental evidence to support these
hypotheses.
1 Reply
Loading