Certified Copy: A Resistant Backdoor Attack

Omid Rajabi Rostami; Rui Ning; Chunsheng Xin; Jin-Hee Cho; Jiang Li; Hongyi Wu

Certified Copy: A Resistant Backdoor Attack

Omid Rajabi Rostami, Rui Ning, Chunsheng Xin, Jin-Hee Cho, Jiang Li, Hongyi Wu

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Backdoor attack, Deep Neural Network, Detection methods

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A resistant backdoor attack designed to escape detection methods and show the potential malicious uses of deep neural networks.

Abstract: The robustness, security, and safety of artificial intelligence systems have become a major concern in recent studies. One of the most significant threats to deep learning models is the backdoor attack, which has been thoroughly investigated. Despite numerous backdoor detection mechanisms developed for computer vision systems, our research shows that even simple backdoor attacks can bypass these defenses if the backdoor planting process and poisoning data are carefully crafted. To evade existing backdoor detection systems, we propose a new backdoored model called Certified Copy, which is trained using a novel cost function. This cost function controls the activation of neurons in the model to ensure that the activation generated by clean inputs is similar to that produced by poisoned input data. The model copies the corresponding clean model during training in all situations except when fed with poisoned inputs. We tested our model against six state-of-the-art defense mechanisms, including Neural Cleanse, TAO, ABS, TABOR, NNoculation, and STRIP. The results showed that most of these methods cannot detect the backdoored model. We conclude that deep learning models have a vast hypothesis space, which can be exploited by malicious attackers to hide malicious activation of neurons using poisoned data, leading to undetected backdoored models.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8074

Loading