SORA: Free Second-Order Attacks in Fast Adversarial Training

Mazdak Teymourian; Ramtin Moslemi; Farzan Rahmani; Mohammad Hossein Rohban

SORA: Free Second-Order Attacks in Fast Adversarial Training

Mazdak Teymourian, Ramtin Moslemi, Farzan Rahmani, Mohammad Hossein Rohban

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Adversarial Training (AT) is a leading defense against adversarial examples but often suffers from *Catastrophic Overfitting* (CO) in efficient single-step variants, where robustness to multi-step attacks collapses despite high single-step performance. We address this failure mode with two contributions. First, we formalize *Epsilon Overfitting* (EO), a perspective in which fixed perturbation magnitudes and directions exacerbate CO, and show that introducing perturbation variability significantly improves robust generalization across different architectures and datasets. Second, we propose **PertAlign** (Perturbation Alignment), a theoretically grounded, computationally negligible metric that predicts CO onset by measuring gradient alignment across attack stages. Leveraging these insights, we introduce **SORA**, an adaptive step-size AT method that dynamically adjusts perturbations based on loss surface geometry. SORA consistently prevents CO, achieves state-of-the-art robustness and clean accuracy, and generalizes across datasets and architectures using a single fixed set of hyperparameters, which is essential for applicability in fast AT. Extensive experiments on diverse datasets and architectures show that SORA matches or surpasses the robustness of prior methods while delivering higher clean accuracy and superior efficiency. Code is available at [https://github.com/SecondOrderAT/SORA](https://github.com/SecondOrderAT/SORA).

Lay Summary: Artificial Intelligence (AI) models used for identifying images can be easily tricked. By adding tiny, carefully crafted changes to an image, changes completely invisible to the human eye, attackers can fool the AI into making wildly incorrect classifications. Teaching the AI to defend against these tricks is called Adversarial Training (AT). While effective, AT requires a massive amount of computing power and time. To solve this, researchers developed 'Fast AT' to make the defense process quicker and cheaper. However, Fast AT suffers from a strange glitch known as Catastrophic Overfitting (CO). When this glitch happens, the AI suddenly becomes very good at blocking quick, simple attacks, but completely loses its ability to defend against stronger, more complex ones. Currently, the AI community does not fully understand why this happens. In our work, we uncover the reasons behind this glitch. We explored how the AI reacts to the size of these invisible image changes, introducing a specific perspective to CO which we call 'Epsilon Overfitting.' Based on this discovery, we developed an early-warning metric called PertAlign that can predict when Catastrophic Overfitting is about to happen. Finally, we introduce SORA, a new, highly efficient training method that avoids this glitch entirely, ensuring the AI remains both fast to train and highly secure.

Originally Submitted Supplementary Material: zip

Link To Code: https://github.com/SecondOrderAT/SORA

Primary Area: Deep Learning->Robustness

Keywords: Adversarial Robustness, Adversarial Training, Fast Adversarial Training, Catastrophic Overfitting

Originally Submitted PDF: pdf

Submission Number: 7283

Loading