Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample

Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample

ICLR 2026 Conference Submission17976 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial sample, Transferable attack

TL;DR: This paper proposes IGSA, a robust adversarial attack framework that significantly improves the resilience of adversarial examples against unknown disturbance through inverse gradient-based sampling and iterative refinement.

Abstract: Adversarial attacks have achieved widespread success in various domains, yet existing methods suffer from significant performance degradation when adversarial examples are subjected to even minor disturbances. In this paper, we propose a novel and robust attack called IGSA (**I**nverse **G**radient **S**ample-based **A**ttack), capable of generating adversarial examples that remain effective under diverse unknown disturbances. IGSA employs an iterative two-step framework: (i) inverse gradient sampling, which searches for the most disruptive direction within the neighborhood of adversarial examples, and (ii) disturbance-guided refinement, which updates adversarial examples via gradient descent along the identified disruptive disturbance. Theoretical analysis reveals that IGSA enhances robustness by increasing the likelihood of adversarial examples within the data distribution. Extensive experiments in both white-box and black-box attack scenarios demonstrate that IGSA significantly outperforms state-of-the-art attacks in terms of robustness against various unknown disturbances. Moreover, IGSA exhibits superior performance when attacking adversarially trained defense models. Code is available at https://github.com/nimingck/IGSA.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 17976

Loading