Keywords: backdoor attacks, model interpretation
Abstract: Backdoor attacks train models on a mixture of poisoned data and clean data to implant backdoor triggers into the model. An interesting phenomenon has been observed in the training process: the loss of poisoned samples tends to drop significantly faster than that of clean samples, which we call the early-fitting phenomenon. Early-fitting provides a simple but effective method to defend against backdoor attacks, as the poisoned samples can be identified by picking the samples with the lowest loss values in the early training epochs. Therefore, two natural questions arise: (1) What characteristics of poisoned samples cause early-fitting? (2) Is it possible to design stronger attacks to circumvent existing defense methods? To answer the first question, we find that early-fitting could be attributed to a unique property of poisoned samples called synchronization, which depicts the latent similarity between two samples. Meanwhile, the degree of synchronization could be explicitly controlled based on whether it is captured by shallow or deep layers of the model. Then, we give an affirmative answer to the second question by proposing a new backdoor attack method, Deep Backdoor Attack (DBA), which utilizes deep synchronization to reversely generate trigger patterns by activating neurons in the deep layer. Experimental results validate our propositions and the effectiveness of DBA. Our code is available at https://anonymous.4open.science/r/Deep-Backdoor-Attack-8875
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)