Abstract: Despite its importance, generating attacks for multilabel learning (MLL) models has received much less attention compared to multi-class recognition. Attacking an MLL
model by optimizing a loss on the target set of labels has often the undesired consequence of changing the predictions
for other labels. On the other hand, adding a loss on the remaining labels to keep them fixed leads to highly negatively
correlated gradient directions, reducing the attack effectiveness. In this paper, we develop a framework for crafting
effective and semantic-aware adversarial attacks for MLL.
First, to obtain an attack that leads to semantically consistent predictions across all labels, we find a minimal superset of the target labels, referred to as consistent target set.
To do so, we develop an efficient search algorithm over a
knowledge graph, which encodes label dependencies. Next,
we propose an optimization that searches for an attack that
modifies the predictions of labels in the consistent target set
while ensuring other labels will not get affected. This leads
to an efficient algorithm that projects the gradient of the
consistent target set loss onto the orthogonal direction of
the gradient of the loss on other labels. Our framework can
generate attacks on different target set sizes and for MLL
with thousands of labels (as in OpenImages). Finally, by
extensive experiments on three datasets and several MLL
models, we show that our method generates both successful
and semantically consistent attacks.
Loading