A General Framework for Black-Box Attacks Under Cost Asymmetry

A General Framework for Black-Box Attacks Under Cost Asymmetry

ICLR 2026 Conference Submission19467 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: zeroth-order optimization, asymmetric cost, black-box adversarial attacks

Abstract: Traditional decision-based black-box adversarial attacks on image classifiers aim to generate adversarial examples by slightly modifying input images while keeping the number of queries low, where each query involves sending an input to the model and observing its output. Most existing methods assume that all queries have equal cost. However, in practice, queries may incur *asymmetric costs*; for example, in content moderation systems, certain output classes may trigger additional review, enforcement, or penalties, making them more costly than others. While prior work has considered such asymmetric cost settings, effective algorithms for this scenario remain underdeveloped. In this paper, we introduce asymmetric black-box attacks, a new family of decision-based attacks that generalize to the asymmetric query-cost setup. We develop new methods for boundary search and gradient estimation when crafting adversarial examples. Specifically, we propose *Asymmetric Search (AS)*, a more conservative alternative to binary search that reduces reliance on high-cost queries, and *Asymmetric Gradient Estimation (AGREST)*, which shifts the sampling distribution in Monte Carlo style gradient estimation to favor low-cost queries. We design efficient algorithms that reduce total attack cost by balancing different query types, in contrast to earlier methods such as *stealthy attacks* that focus only on limiting expensive (high-cost) queries. We perform both theoretical analysis and empirical evaluation on standard image classification benchmarks. Across various cost regimes, our method consistently achieves lower total query cost and smaller perturbations than existing approaches, reducing the perturbation norm by up to 40\% in some settings.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 19467

Loading