TL;DR: We propose a defense leveraging neural networks' weight symmetry to remove backdoors, with theoretical guarantees of effectiveness across various learning settings.
Abstract: Deep neural networks are vulnerable to backdoor attacks, where malicious behaviors are implanted during training. While existing defenses can effectively purify compromised models, they typically require labeled data or specific training procedures, making them difficult to apply beyond supervised learning settings. Notably, recent studies have shown successful backdoor attacks across various learning paradigms, highlighting a critical security concern. To address this gap, we propose Two-stage Symmetry Connectivity (TSC), a novel backdoor purification defense that operates independently of data format and requires only a small fraction of clean samples. Through theoretical analysis, we prove that by leveraging permutation invariance in neural networks and quadratic mode connectivity, TSC amplifies the loss on poisoned samples while maintaining bounded clean accuracy. Experiments demonstrate that TSC achieves robust performance comparable to state-of-the-art methods in supervised learning scenarios. Furthermore, TSC generalizes to self-supervised learning frameworks, such as SimCLR and CLIP, maintaining its strong defense capabilities. Our code is available at https://github.com/JiePeng104/TSC.
Lay Summary: Imagine an AI system, like one that helps doctors diagnose diseases or powers your favorite app. What if this AI could be secretly tricked during its learning phase to make specific, harmful mistakes later on, without anyone noticing the manipulation until it's too late? This is called a "backdoor attack," and it makes AI systems vulnerable and untrustworthy. Fixing these secretly compromised AIs is a big challenge. Current fixes for these compromised AIs often require specific data or complex setups, limiting their use, especially as these attacks are now seen in various AI learning styles. Our research introduces a new and more flexible way to purify these AI systems, called Two-stage Symmetry Connectivity (TSC). TSC stands out because it works regardless of the data type (images, sound, etc.) and needs only a tiny amount of clean data. We discovered that by using certain inherent mathematical properties of AI networks, TSC can make the "poisoned" parts of the AI stand out, allowing the malicious behavior to be neutralized while ensuring the AI remains good on initial tasks.
Link To Code: https://github.com/JiePeng104/TSC
Primary Area: Social Aspects->Security
Keywords: AI security, backdoor defenses, mode connectivity, permutation invariance
Submission Number: 11061
Loading