LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect

Yixiao Xu; Binxing Fang; Mohan Li; Keke Tang; Zhihong Tian

LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect

Yixiao Xu, Binxing Fang, Mohan Li, Keke Tang, Zhihong Tian

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: backdoor defense, natural language processing, deep long-tailed learning

Abstract: Language models have shown vulnerability against backdoor attacks, threatening the security of services based on them. To mitigate the threat, existing solutions attempted to search for backdoor triggers, which can be time-consuming when handling a large search space. Looking into the attack process, we observe that poisoned data will create a long-tailed effect in the victim model, causing the decision boundary to shift towards the attack targets. Inspired by this observation, we introduce LT-Defense, the first searching-free backdoor defense via exploiting the long-tailed effect. Specifically, LT-Defense employs a small set of clean examples and two metrics to distinguish backdoor-related features in the target model. Upon detecting a backdoor model, LT-Defense additionally provides test-time backdoor freezing and attack target prediction. Extensive experiments demonstrate the effectiveness of LT-Defense in both detection accuracy and efficiency, e.g., in task-agnostic scenarios, LT-Defense achieves 98% accuracy across 1440 models with less than 1% of the time cost of state-of-the-art solutions.

Primary Area: Safety in machine learning

Submission Number: 18435

Loading