Abstract: Deep neural networks (DNNs) are susceptible to backdoor attacks due to their black-box nature and lack of interpretability. Backdoor attacks intend to manipulate the model's prediction when hidden backdoors are activated by predefined triggers. Although considerable progress has been made in backdoor detection and removal at the model deployment stage, an effective defense against backdoor attacks during the training time is still under-explored. In this paper, we propose a novel training-time backdoor defense method called Learning from Distinction (LfD), allowing training a backdoor-free model on the backdoor-poisoned data. LfD uses a low-capacity model as a teacher to guide the learning of a backdoor-free student model via a dynamic weighting strategy. Extensive experiments on CIFAR-10, GTSRB and ImageNet-subset datasets show that LfD significantly reduces attack success rates to 0.67\%, 6.14\% and 1.42\%, respectively, with minimal impact on clean accuracy (less than 1\%, 3\% and 1\%).
Primary Subject Area: [Content] Vision and Language
Relevance To Conference: In multimedia applications, visual information such as images and videos holds a prominent position. The advancement of deep learning has pushed the boundaries of digital image processing, but it has also introduced numerous security concerns. This work proposing a novel training-time defense against backdoor attacks in computer vision. Our defense approach can enhance the robustness of multimedia processing systems, safeguarding them from malicious assaults, fosters advancements in security protection within the multimedia/multimodal processing domain, laying the technical foundation for safer and more reliable multimedia applications.
Submission Number: 2006
Loading