SINAI: Strategic Injection of Noise for Adversarial  Defense with Improved Accuracy–Robustness Tradeoffs

SINAI: Strategic Injection of Noise for Adversarial Defense with Improved Accuracy–Robustness Tradeoffs

ICLR 2026 Conference Submission20693 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial Defense, Vision Models, Noise Injection, Black-box Attacks, Accuracy–Robustness Trade-offs

TL;DR: We propose SINAI, a strategic noise injection defense that improves the clean–robust trade-off of vision models against query-based black-box attacks.

Abstract: Vision models are widely used for edge deployment, but they are highly vulnerable to query-based black-box adversarial attacks. Existing noise injection defenses, while promising, often overlook the unique characteristics of vision models, such as the heterogeneity between attention and feedforward modules, which fundamentally shape how injected noise propagates. Thus, previous methods yield suboptimal trade-offs between clean and robust accuracy. In this work, we scrutinize noise injection methods to vision models and provide two insights: (1) noise injection should explicitly consider the activation function, as attention modules with Softmax respond differently from FFN and CNN modules with GeLU or ReLU, and (2) gradient norms vary across logits within the same layer, so uniform noise injection for all the logits can perturb high-gradient logits and hurt clean accuracy, which motivates a fine-grained, logit-aware allocation strategy. Building on these observations, we propose an adaptive noise injection defense that combines module-level and logit-level noise allocation: injecting stronger noise to attention with softmax, while applying fine-grained, logit-aware noise in GeLU/ReLU based FFN and convolutional modules. We then formulate the search process of noise injection hyperparameters as a constrained optimization problem, in which the clean accuracy drop is bounded, and solve it via Bayesian optimization. Experiments on ViT-B-16-224 with ImageNet show that our method improves average robust accuracy by 4.2\% over feature noise defense under three 10k steps query-based black-box attacks with $\approx$2\% clean accuracy drop.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 20693

Loading