Black-box Attack Robustness with Model Diversity and Randomization

Quoc Viet Vo; Bao Gia Doan; Ehsan Abbasnejad; Damith Ranasinghe

Black-box Attack Robustness with Model Diversity and Randomization

Quoc Viet Vo, Bao Gia Doan, Ehsan Abbasnejad, Damith Ranasinghe

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial Robustness, Query-based black-box attacks, adversarial defense

Abstract: Query-based black-box attack algorithms can compute imperceptible adversarial perturbations to misguide learned models, relying only on model outputs. The success of these attack algorithms poses a significant problem, especially for Machine Learning as a Service (MLaaS) providers. Our study explores a new approach to obfuscate information from an attacker. To craft an adversarial example, attacks exploit the relationship between successive responses to queries to optimize a perturbation. Our idea to attempt to obfuscate this relationship is to randomly select a model from a diverse set of models to respond to each query. Effectively, this randomization of models violates the attacker’s assumption of model parameters remaining unaltered between queries to extract information. What is unclear is, if model randomization leads to sufficient obfuscation to confuse attacks or how best to build such a method. This study seeks answers to these questions. Our theoretical analysis proves this approach consistently increases robustness. Extensive experiments across 7 state-of-the-art attacks and all major perturbation norms ($l_\infty$, $l_2$, $l_0$), including adaptive variants, confirm its effectiveness. Importantly, our findings reveal a new avenue for investigating robust methods against black-box attacks, offering theoretical understandings and a practical implementation pathway.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 23304

Loading