RobustEncoder: Leveraging K-Means clustering technique to defend NLP models against backdoor attacks
Abstract: As machine learning (ML) systems become increasingly integrated into real-world applications for sensitive tasks, ensuring the security and privacy of these models becomes paramount. Deep Neural Networks (DNNs), in particular, are susceptible to backdoor attacks, where adversaries manipulate training data by inserting specially crafted samples. While the NLP community has extensively studied backdoor attacks, there remains a gap in effective defense mechanisms. To address this, we propose RobustEncoder, a novel approach leveraging K-Means clustering to detect and mitigate backdoor attacks in text-based models. Our method demonstrates significant efficacy in identifying and neutralizing backdoor triggers, as evidenced by extensive empirical evaluations. Additionally, we propose potential applications of blockchain technology to further enhance the security and integrity of the defense mechanisms in future implementations.
Loading