A Theoretical Perspective on the Robustness of Feature Extractors

Published: 20 Jun 2023, Last Modified: 07 Aug 2023AdvML-Frontiers 2023EveryoneRevisionsBibTeX
Keywords: adversarial robustness, feature extractors, deep neural networks, lower bounds
TL;DR: Finding lower bounds on the robustness of classifiers built with fixed feature extractors
Abstract: Recent theoretical work on robustness to adversarial examples has derived lower bounds on how robust *any model* can be when the distribution and adversarial constraints are specified. However, these bounds do not account for the specific models used in practice, such as neural networks. In this paper, we develop a methodology to analyze the fundamental limits on the *robustness of fixed feature extractors*, which in turn provides bounds on the robustness of classifiers trained on top of them. The tightness of these bounds relies on the effectiveness of the method used to find collisions between pairs of perturbed examples at deeper layers. For linear feature extractors, we provide closed-form expressions for collision finding while for piece-wise linear feature extractors, we propose a bespoke algorithm based on the iterative solution of a convex program that provably finds collisions. We utilize our bounds to identify structural features of classifiers that lead to a lack of robustness and provide insights into the effectiveness of different training methods at obtaining robust feature extractors.
Submission Number: 70
Loading