Defending against Clean-Image Backdoor Attack in Multi-Label Classification

Cheng-Yi Lee; Cheng-Chang Tsai; Ching-Chia Kao; Chun-Shien Lu; Chia-Mu Yu

Defending against Clean-Image Backdoor Attack in Multi-Label Classification

Cheng-Yi Lee, Cheng-Chang Tsai, Ching-Chia Kao, Chun-Shien Lu, Chia-Mu Yu

Published: 01 Jan 2024, Last Modified: 12 May 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep neural networks (DNNs) are known to be vulnerable to backdoor attacks. Specifically, the attacker endeavors to implant backdoors in the DNN model by injecting a set of poisoning samples such that the malicious model predicts target labels once the backdoor is triggered. The clean-image attack has recently emerged as a threat in multi-label classification, where an attacker is able to poison training labels without tampering with image contents. In this paper, we propose a simple but effective method to alleviate clean-image backdoor attacks. Considering the difference in weight convergence between the benign model and backdoor model, our method relies on partial weight initialization and fine-tuning to mitigate the backdoor behaviors of a suspicious model. The fine-tuned model sustains its clean accuracy through knowledge distillation over a few iterations. Importantly, our approach does not require extra clean images for purification. Extensive experiments demonstrate the effectiveness of our defenses against clean-image attacks for multi-label classifications across two benchmark datasets.

Loading