Defense Against Multi-target Multi-trigger Backdoor Attacks

Published: 01 Jan 2025, Last Modified: 31 Jul 2025PAKDD (6) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neural backdoor attacks present a critical vulnerability in deep-learning systems. In this work, we introduce a more general form of trigger-based attacks that bypass existing defense methods. To counter this, we propose a novel defense mechanism grounded in an information-theoretic analysis of how backdoors are most efficiently encoded in the feature space. Specifically, we prove that the most information-efficient way to represent multiple trigger-based backdoors is to construct layered manifolds, where each layer corresponds to a unique backdoor trigger and the separation between layers is encoded radially (assuming the usual high-dimensional hyper-spherical data distribution). This structure enables the main classifier to maintain its original classification boundary while accessing backdoors through simple radial separations. Our defense exploits this insight by searching for perturbations that exhibit global behaviour, indicative of the presence of such layered manifolds and, therefore, backdoors. Extensive experiments using a variety of image datasets demonstrate that our method successfully identifies backdoors that are missed by state-of-the-art detection methods.
Loading