Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss
Keywords: adversarial attacks, floating-point errors, robustness evaluation, optimal scaling factor, theoretical analysis
Abstract: Gradient-based adversarial attacks using the Cross-Entropy (CE) loss often overestimate robustness due to relative errors in gradient computation induced by floating-point arithmetic. Empirical methods like MIFPE mitigate this by scaling logits with a factor $ c = T / \Delta_{\text{detach}} $ where $ T = 1 $, significantly improving evaluation accuracy. However, a theoretical understanding of these errors remains limited.
To bridge this gap, we pioneer the first rigorous theoretical analysis of floating-point errors in CE-based gradient attacks, systematically dissecting relative errors across four distinct scenarios: (i) unsuccessful untargeted attacks, (ii) successful untargeted attacks, (iii) unsuccessful targeted attacks, and (iv) successful targeted attacks. This foundational study uncovers novel patterns in numerical instability and derives the optimal scaling factor $T = t^\* $ that minimizes error impact in each scenario. Notably, our analysis reveals that $ t^\* $ closely approximates 1 in unsuccessful untargeted attacks, providing a theoretical justification for MIFPE's empirical choice and addressing prior optimality gaps.
To validate the correctness of our theoretical derivations, we refine MIFPE by incorporating $ T = t^\* $ into the Theoretical MIFPE (T-MIFPE) loss function, which further reduces floating-point-induced errors. Comprehensive experiments validate our theory.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 10896
Loading