RobustEncoder: Leveraging K-Means clustering technique to defend NLP models against backdoor attacks

Published: 01 Jan 2024, Last Modified: 12 May 2025BCCA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As machine learning (ML) systems become increasingly integrated into real-world applications for sensitive tasks, ensuring the security and privacy of these models becomes paramount. Deep Neural Networks (DNNs), in particular, are susceptible to backdoor attacks, where adversaries manipulate training data by inserting specially crafted samples. While the NLP community has extensively studied backdoor attacks, there remains a gap in effective defense mechanisms. To address this, we propose RobustEncoder, a novel approach leveraging K-Means clustering to detect and mitigate backdoor attacks in text-based models. Our method demonstrates significant efficacy in identifying and neutralizing backdoor triggers, as evidenced by extensive empirical evaluations. Additionally, we propose potential applications of blockchain technology to further enhance the security and integrity of the defense mechanisms in future implementations.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview