Adaptive Norm Selection Prevents Catastrophic Overfitting in Fast Adversarial Training

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Robustness; Fast Adversarial Training; Catastrophic Overfitting; Gradient Concentration; Adaptive Norm Selection; $l^p$ Norms; Participation Ratio; Fixed-Point Formulation
TL;DR: We prevent catastrophic overfitting in fast adversarial training through adaptive $l^p$-FGSM, which dynamically adjusts norm selection based on gradient concentration metrics without requiring noise injection or complex regularization.
Abstract: We present a novel solution to Catastrophic Overfitting (CO) in fast adversarial training based solely on adaptive $l^p$ norm selection. Unlike existing methods requiring noise injection, regularization, or gradient clipping, our approach dynamically adjusts training norms based on gradient concentration, preventing the vulnerability to multi-step attacks that plagues single-step methods. We begin with the empirical observation that, with small perturbations, CO occurs predominantly under $l^{\infty}$ rather than $l^2$ norms. Building on this observation, we formulate generalized $l^p$ attacks as a fixed-point problem and develop $l^p$-FGSM to analyze the $l^2$-to-$l^{\infty}$ transition. Our key discovery: CO arises when concentrated gradients—with information localized in few dimensions—meet aggressive norm constraints. We quantify gradient concentration via Participation Ratio from quantum mechanics and entropy metrics, yielding an adaptive $l^p$-FGSM that dynamically adjusts the training norm based on gradient structure. Experiments show our method achieves robust performance without auxiliary regularization or noise injection, offering a principled solution to the CO problem.
Submission Number: 30
Loading