Knowledge-Driven Backdoor Removal in Deep Neural Networks via Reinforcement Learning

Jiayin Song; Yike Li; Yunzhe Tian; Xingyu Wu; Qiong Li; Endong Tong; Wenjia Niu; Zhenguo Zhang; Jiqiang Liu

Knowledge-Driven Backdoor Removal in Deep Neural Networks via Reinforcement Learning

Jiayin Song, Yike Li, Yunzhe Tian, Xingyu Wu, Qiong Li, Endong Tong, Wenjia Niu, Zhenguo Zhang, Jiqiang Liu

Published: 01 Jan 2024, Last Modified: 29 Oct 2024KSEM (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Backdoor attacks have become a major security threat to deep neural networks (DNNs), promoting significant studies in backdoor removal to mitigate these attacks. However, existing backdoor removal methods often work independently and struggle to generalize across various attacks, which limits their effectiveness when the specific methods used by attackers are unknown. To effectively defend against multiple backdoor attacks, in this paper, we propose the Reinforcement Learning-based Backdoor Removal (RLBR) framework, which integrates multiple defense strategies and dynamically switches various defense methods during the removal process. Driven by the knowledge we observed that a) neuron activation patterns vary significantly under different attacks, and b) these patterns dynamically change during the removal process, we take the neuron activation pattern of the poisoned models as the environment state in the RLBR framework. Besides, we evaluate the defense effectiveness as rewards to guide the selection of optimal defense strategy at each decision point. Through extensive experiments against six state-of-the-art backdoor attacks on two benchmark datasets, RLBR improved defensive performance by 6.91% while maintaining an accuracy of 92.63% on clean datasets, compared to seven baseline backdoor defense methods.

Loading