On the Inherent Vulnerability of Randomized Models to Nagging Attack

Wei Zhang; Jiashen Fu; Xiao Li; Xiaolin Hu

On the Inherent Vulnerability of Randomized Models to Nagging Attack

Wei Zhang, Jiashen Fu, Xiao Li, Xiaolin Hu

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safety of AI systems, adversarial robustness, randomized models

TL;DR: We show that randomized models are inherently vulnerable to nagging attack, and provide seed-rotation as a solution to address the vulnerability.

Abstract: There has been a long debate on whether inference-time randomness enhances or reduces the safety of deep learning models. In this paper, we formalize a realistic threat model, and provide both theoretical and empirical evidence demonstrating that randomness in inference undermines model safety. The threat we consider is *nagging attack*, where an adversary repeatedly queries the model with the same adversarial example, exploiting the model’s inherent randomness until a failure occurs. Theoretically, we show that nagging attack introduces a fundamental vulnerability for randomized models, which cannot be eliminated by refining their design. To address this challenge, we propose *seed-rotation*, a deployment strategy that de-randomizes the model within fixed intervals while retaining stochasticity across intervals. We prove that seed-rotation makes the randomized model safer. Empirically, we compared seed-rotation to the standard randomized deployment across multiple randomized defense methods. The results consistently showed that seed-rotation achieved higher adversarial robustness than the randomized deployment.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 8240

Loading