Abstract: While current methods for training robust deep learning models optimize robust accuracy, in practice, the resulting models are often both robust and inaccurate on numerous samples, providing a false sense of safety for those. Further, they significantly reduce natural accuracy, which hinders the adoption in practice. In this work, we address both of these challenges by extending prior works in three main directions. First, we propose a new training method that jointly maximizes robust accuracy and minimizes robust inaccuracy. Second, since the resulting models are trained to be robust only if they are accurate, we leverage robustness as a principled abstain mechanism. Finally, this abstain mechanism allows us to combine models in a compositional architecture that significantly boosts overall robustness without sacrificing accuracy. We demonstrate the effectiveness of our approach to both empirical and certified robustness on six recent state-of-the-art models and using several datasets. Our results show that our method effectively reduces robust and inaccurate samples by up to 97.28%. Further, it successfully enhanced the $\epsilon_\infty = 1/255$ robustness of a state-of-the-art model from 26% to 86% while only marginally reducing its natural accuracy from 97.8% to 97.6%.
13 Replies
Loading