On the Fragility of Graph Backdoor Defenses: A Robust Strategy via Layer-wise Feature Divergence

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph backdoor defense; Robustness; Layerwise Linear Feature Connectivity
Abstract: Recent studies have revealed the high susceptibility of GNNs against backdoor attacks, which poses a significant threat to their practical applications. In order to deal with the threats posed by backdoors, a series of targeted defense measures have been proposed, which have effectively alleviated the harm of backdoor attacks to a certain extent. However, do these methods really completely eliminate the threat of backdoors? Inspired by related research in the DNN field, we conduct the first systematic robustness analysis of backdoor defenses in the GNN domain. Our experiments reveal that even fine-tuning the defense model for only five epochs with a small fraction of poisoned data can cause a sharp resurgence in its ASR, indicating that residual backdoor features persist and can be readily reactivated. Recognizing the unique message-passing paradigm in GNNs, we leverage Layer-wise Linear Feature Connectivity (LLFC) to uncover the root cause of this pronounced fragility in current graph backdoor defenses. To enhance the robustness of these defenses, we also propose a novel strategy termed \textbf{Layer-wise Feature Divergence (LFD)}, which forces the defense model to diverge from the original backdoor model by maximizing the distance between their respective layer-wise features during retraining. Extensive experiments demonstrate that LFD significantly enhances the robustness of defense models, achieving state-of-the-art performance in defense capabilities while maintaining high accuracy on clean data.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 10644
Loading