$D^3$: Detoxing Deep Learning Dataset

Published: 28 Oct 2023, Last Modified: 13 Mar 2024NeurIPS 2023 BUGS PosterEveryoneRevisionsBibTeX
Keywords: Backdoor attack, data poisoning, image classification
TL;DR: We propose a post-mortem new dataset detoxing technique, which is based on a novel differential analysis to extract triggers and data augmentation.
Abstract: Data poisoning is a prominent threat to Deep Learning applications. In backdoor attack, training samples are poisoned with a specific input pattern or transformation called trigger such that the trained model misclassifies in the presence of trigger. Despite a broad spectrum of defense techniques against data poisoning and backdoor attacks, these defenses are often outpaced by the increasing complexity and sophistication of attacks. In response to this growing threat, this paper introduces $D^3$, a novel dataset detoxification technique that leverages differential analysis methodology to extract triggers from compromised test samples captured in the wild. Specifically, we formulate the challenge of poison extraction as a constrained optimization problem and use iterative gradient descent with semantic restrictions. Upon successful extraction, $D^3$ enhances the dataset by incorporating the poison into clean validation samples and builds a classifier to separate clean and poisoned training samples. This post-mortem approach provides a robust complement to existing defenses, particularly when they fail to detect complex, stealthy poisoning attacks. $D^3$ is evaluated on 42 poisoned datasets with 18 different types of poisons, including the subtle clean-label poisoning, dynamic attack, and input-aware attack. It achieves over 95\% precision and 95\% recall on average, substantially outperforming the state-of-the-art.
Submission Number: 12