Abstract: This work proposes GangSweep, a new backdoor detection framework that leverages the super reconstructive power of Generative
Adversarial Networks (GAN) to detect and “sweep out” neural
backdoors. It is motivated by a series of intriguing empirical investigations, revealing that the perturbation masks generated by GAN
are persistent and exhibit interesting statistical properties with
low shifting variance and large shifting distance in feature space.
Compared with the previous solutions, the proposed approach eliminates the reliance on the access to training data, and shows a high
degree of robustness and efficiency for detecting and mitigating a
wide range of backdoored models with various settings. Moreover,
this is the first work that successfully leverages generative networks to defend against advanced neural backdoors with multiple
triggers and their polymorphic forms
0 Replies
Loading