A Double-Edged Sword: The Power of Two in Defending Against DNN Backdoor Attacks

Quentin Le Roux, Kassem Kallas, Teddy Furon

Published: 26 Aug 2024, Last Modified: 09 Nov 2024EUSIPCO 2024EveryoneRevisionsCC BY 4.0

Abstract: Backdoor attacks on deep neural networks work by injecting them with a malicious behavior during training. Such behavior can then be activated at test-time using cleverly-crafted triggers. Defending against backdoors is key in machine learning security in order to safeguard the trust between model providers and users. This paper demonstrates the open problem of backdoor defense performance against a representative selection of backdoor attacks, with a main focus on input purification (a valuable defense category in black-box contexts where all DNN inputs are preprocessed in the hope of erasing a potential trigger). We show that current defenses are adversary-aware and dataset-dependent. They typically focus on patch-based attacks and simpler image classification datasets. This brittleness when using stand-alone defenses highlights the cat-and-mouse game currently affecting the backdoor literature. In this context, we propose a two-defense strategy using existing methods as a palliative solution while waiting for future developments.