Keywords: backdoor attack, backdoor defense, AI security
Abstract: Backdoor attacks in machine learning create hidden vulnerability by manipulating the model behaviour with specific triggers. Such attacks often remain unnoticed as the model operates as expected for normal input. Thus, it is imperative to understand the intricate mechanism of backdoor attacks. To address this challenge, in this work, we introduce three key requirements that a backdoor attack must meet. Moreover, we note that current backdoor attack algorithms, whether employing fixed or input-dependent triggers, exhibit a high binding with model parameters, rendering them easier to defend against. To tackle this issue, we propose the Key-Locks algorithm, which separates the backdoor attack process into embedding locks and employing a key for unlocking. This method enables the adjustment of unlocking levels to counteract diverse defense mechanisms. Extensive experiments are conducted to evaluate the effective of our proposed algorithm. Our code is available at: https://anonymous.4open.science/r/KeyLocks-FD85
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9344
Loading